
RATAN TATA LIBRARY 

Cl. No. S ^ ^ ^ 

Ac. No. n7>b) Date of release for loan 

This boojc should be returned on or before the date last stamped 
b^ow An overdue charge of 0.5 nP. will be charged for each 
day the book is kept overtime. 










AN INTRODUCTION 
TO 

LINEAR ALGEBRA 




AN INTRODUCTION 
TO 

LINEAR ALGEBRA 


BY 

L. MIRSKY 


lgutubbr in matbbmatics in this 
(JNIVEBsfcv OP SHEPPIEI-D 


OXFORD 

AT THE CLARENDON PRESS 
1955 



Oxford University Press, Amen House, London E,GA 

GLASGOW ysw YOBK TOBONTO MELBOURNE WELLINGTON 
BOMBAY CALCUTTA MADRAS KARACHI CAPE TOWN IBADAN 

Geoffrey Cumherlege, Publisher to the University 


PRINTED IN GREAT BRITAIN 



PREFACE 


My object in writing this book has been to provide an elementary 
and easily readable account of linear algebra. The book is intended 
mainly for students pursuing an honours course in mathematics, 
but I hope that the exposition is sufficiently simple to make it 
equally useful to readers whose principal interests lie in the fields 
of physics or technology. The material dealt with here is not 
extensive and, broadly speaking, only those topics are discussed 
which normally form part of the honours mathematics syllabus in 
British universities. Within this compass I have attempted to 
present a systematic and rigorous development of the subject. 
The account is self-contained, and the reader is not assumed to 
have any previous knowledge of linear algebra, although some 
slight acquaintance with the elementary theory of determinants 
will be found helpful. 

It is not easy to estimate what level of abstractness best suits 
a textbook of linear algebra. Since I have aimed, above all, at 
simplicity of presentation I have decided on a thoroughly concrete 
treatment, at any rate in the initial stages of the discussion. Thus 
I operate throughout with real and complex numbers, and I 
define a vector as an ordered set of numbers and a matrix as a 
rectangular array of numbers. After the first three chapters, 
however, a new and more abstract point of view becomes prominent. 
Linear manifolds (i.e. abstract vector spaces) are considered, and 
the algebra of matrices is then recognized to be the appropriate 
tool for investigating the properties of linear operators; in fact, 
particular stress is laid on the representation of linear operators by 
matrices. In this way the reader is led gradually towards the 
fundamental concept of invariant characterization. 

The points of contact between linear algebra and geometry are 
numerous, and I have taken every opportunity of bringing them to 
the reader’s notice. I have not, of course, sought to provide a syste¬ 
matic discussion of the algebraic background of geometry, but 
have rather concentrated on a few special topics, such as changes 
of the coordinate system, reduction of quadrics to principal axes, 
rotations in the plane and in space, and the classification of 
quadrics under the projective and affine groups. 



vi 


PBEFACE 


The theory of matrices gives rise to many striking inequalities. 
The proofs of these are generally very simple, but are widely 
scattered throughout the literature and are often not easily 
accessible. I have here attempted to collect together, with proofs, 
all the better known inequalities of matrix theory. I have also 
included a brief sketch of the theory of matrix power series, a topic 
of considerable interest and elegance not normally dealt with in 
elementary textbooks. 

Numerous exercises are incorporated in the text. They are 
designed not so much to test the reader’s ingenuity as to direct his 
attention to analogues, generalizations, alternative proofs, and so 
on. The reader is recommended to work through these exercises, 
as the results embodied in them are frequently used in the sub¬ 
sequent discussion. At the end of each chapter there is a series of 
miscellaneous problems arranged approximately in order of in¬ 
creasing difficulty. Some of these involve only routine calculations, 
others call for some manipulative skill, and yet others carry the 
general theory beyond the stage reached in the text. A number 
of these problems have been taken from recent examination papers 
in mathematics, and thanks for permission to use them are due to 
the Delegates of the Clarendon Press, the Syndics of the Cambridge 
University Press, and the Universities of Bristol, London, Liver¬ 
pool, Manchester, and Sheffield. 

The number of existing books on linear algebra is large, and it is 
therefore difficult to make a detailed acknowledgement of sources. 
I ought, however, to mention Turnbull and Aitken, An Introduction 
to the Theory of Canonical Matrices, and MacDuffee, The Theory 
of Matrices, on both of which I have drawn heavily for historical 
references. 

I have received much help from a number of friends and 
colleagues. Professor A. G. Walker first suggested that I should 
write a book on linear algebra and his encouragement has been 
invaluable. Mr. H. Burkill, Mr. A. R. Curtis, Dr. C. S. Davis, 
Dr. H. K. Farahat, Dr. Christine M. Hamill, Professor H.^A. 
Heilbronn, Professor D. G. Northcott, and Professor A. Oppenheim 
have all helped me in a variety of ways, by checking parts of the 
manuscript or advising me on specific points, Mr. J. C. Shepherdson 
read an early version of the manuscript and his acute comments 
have enabled me to remove many obscurities and ambiguities; he 
has, in addition, given me considerable help with Chapters IX and 



PREFACE 


vii 


X. The greatest debt I owe is to Dr. G. T. Kneebone and Professor 
R. Rado with both of whom, for several years past, I have been 
in the habit of discussing problems of linear algebra and their 
presentation to students. But for these conversations I should not 
have been able to write the book. Dr. Kneebone has also read 
and criticized the manuscript at every stage of preparation and 
Professor Rado has supplied me with several of the proofs and 
problems which appear in the text. Finally, I wish to record my 
thanks to the officers of the Clarendon Press for their helpful 
co-operation. 




CONTENTS 


PART I 

DETERMINANTS, VECTORS, MATRICES, AND 
LINEAR EQUATIONS 

I. DETERMINANTS 

1.1. Arrangements and the €-symbol 1 

1.2. Elementary properties of determinants 5 

1.3. Multiplication of determinants 12 

1.4. Expansion theorems 14 

1.5. Jacobi’s theorem 24 

1.6. Two special theorems on linear equations 27 

II. VECTOR SPACES AND LINEAR MANIFOLDS 

2.1. The algebra of vectors 39 

2.2. Linear manifolds 43 

2.3. Linear dependence and bases 48 

2.4. Vector representation of linear manifolds 57 

2.5. Inner products and orthonormal bases 62 

III. THE ALGEBRA OF MATRICES 

3.1. Elementary algebra 72 

3.2. Preliminary notions concerning matrices 74 

3.3. Addition and multiplication of matrices 78 

3.4. Application of matrix technique to linear substitutions 85 

3.5. Adjugate matrices 87 

3.6. Inverse matrices 90 

3.7. Rational functions of a square matrix 97 

3.8. Partitioned matrices 100 

IV. LINEAR OPERATORS 

4.1. Change of basis in a linear manifold 111 

4.2. Linear operators and their representations . 113 

4.3. Isomorphisms and automorphisms of linear manifolds 123 

4.4. Further instances of linear operators 126 

V. SYSTEMS OF LINEAR EQUATIONS AND RANK OF 
MATRICES 

6.1. Preliminary results 131 

6.2. The rank theorem 136 



X 


CONTENTS 


h 

6.3. The general theory of linear equations 140 

6.4. Systems of homogeneous linear equations 148 

6.6. Miscellaneous applications 162 

6.6. Further theorems on rank of matrices ^ 168 

VI. ELEMENTARY OPERATIONS AND THE CONCEPT 
OF EQUIVALENCE 

6.1. E-operations and E-matrices 168 

6.2. Equivalent matrices 172 

6.3. Applications of the preceding theory 178 

6.4. Congruence transfonmations 182 

6.6. The general concept of equivalence 186 

6.6. Axiomatic characterization of determinants 189 


PART II 

FURTHER DEVELOPMENT OF MATRIX THEORY 


VII. THE CHARACTERISTIC EQUATION 

7.1. The coefficients of the characteristic polynomial 195 

7.2. Characteristic polynomials and similarity transformations 199 

7.3. Characteristic roots of rational functions of matrices 201 

7.4. The minimum polynomial and the theorem of Cayley and 

Hamilton 202 

7.6. Estimates of characteristic roots 208 

7.6. Characteristic vectors 214 

VIII. ORTHOGONAL AND UNITARY MATRICES 

8.1. Orthogonal matrices 222 

8.2. Unitary matrices 229 

8.3. Rotations in the plane * 233 

8.4. Rotations in space 236 

IX. GROUPS 

9.1. The axioms of group theory 252 

9.2. Matrix groups and operator groups 261 

9.3. Representation of groups by matrices 267 

9.4. Groups of singular matrices 272 

9.6. Invariant spaces and groups of linear transformations 2'ft 

X. CANONICAL FORMS 

10.1. The idea of a canonical form 290 

10.2. Diagonal canonical forms under the similarity group 292 

10.3. Diagonal canonical forms under the orthogonal similarity 

group and the imitary similarity group 300 



CONTENTS • xi 

10.4. Triangular canonical forms 306 

10.5. An intermediate canonical form 312 

10.6. Simultaneous similarity transformations 316 

XI. MATRIX ANALYSIS 

11.1. Convergent matrix sequences 327 

11.2. Power series and matrix functions 330 

11.3. The relation between matrix functions and matrix poly¬ 

nomials 341 

11.4. Systems of linear differential equations 343 

PART III 

QUADRATIC FORMS 

XII. BILINEAR, QUADRATIC, AND HERMITIAN 
FORMS 

12.1. Operators and forms of the bilinear and quadratic types 353 

12.2. Orthogonal reduction to diagonal form 362 

12.3. General reduction to diagonal form 367 

12.4. The problem of equivalence. Rank and signature 375 

12.5. Classification of quadrics 380 

12.6. Hermitian forms 385 

XIII. DEFINITE AND INDEFINITE FORMS 

13.1. The value classes 394 

13.2. Transformations of positive definite forms 398 

13.3. Determinantal criteria 400 

13.4. Simultaneous reduction of two quadratic forms 408 

13.5. The inequalities of Hadamard, Minkowski, Fischer, and 

^ • Oppenheim 416 

BIBLIOGRAPHY 427 

INDEX 429 




PART I 


DETERMINANTS, VECTORS, MATRICES, 
AND LINEAR EQUATIONS 


DETERMINANTS 

The present book is intended to give a systematic account of the 
elementary parts of linear algebra. The technique best suited to 
this branch of mathematics is undoubtedly that provided by the 
calculus of matrices, to which much of the book is devoted, but we 
shall also require to make considerable use of the theory of deter¬ 
minants, partly for theoretical purposes and partly as an aid to 
computation. In this opening chapter we shall develop the principal 
properties of determinants to the extent to which they are needed 
for the treatment of linear algebra, 

The theory of determinants was, indeed, the first topic in linear 
algebra to be studied intensively. It was initiated by Leibnitz 
in 1696, developed further by Bezout, Vandermonde, Cramer, 
Lagrange, and Laplace, and given the form with which we are now 
familiar by Cauchy, Jacobi, and Sylvester in the first half of the 
nineteenth century. The term ‘determinant’ occurs for the first 
time in Gauss’s Disquisitiones arithmeticae (1801).J 

1,1. Arrangements and the c-symbol 

In order to define determinants it is necessary to refer to arrange¬ 
ments among a set of numbers, and the theory of determinants 
can be based on a few simple results concerning such arrangements. 
In the present section we shall therefore derive the requisite 
preliminary results. 

1.1.1. We shall denote by (Ai,...,A,,) the ordered set consisting 
ofithe integers Ai,...,A^. 

t For a much more detailed discussion of determinants see Kowalewski, 
Einfuhrung in die Determinantentheorie, Briefer accounts will be foimd in Burnside 
and Panton, The Theory of Equations^ and in Ferrar, 2, Aitken, 10, and Perron, 12. 
(Numbers in bold-face type refer to the bibliography at the end.) 

t For historical and bibliographical information see Muir, The Theory of 
Determi-nanta in the Historical Order of Development* 

B 


6582 



DETEBMINANTS 


I,§ 1.1 

Dbitoition 1.1.1. If (Aj,..., A„) and /*„) contain the same 
{distinct) integers, but these integers do not necessarily occur in the 
same order, then (Ai,...,A„) and (/ii,...,/*„) are said to be abbanob- 
MBNTsf of each other. In symbols’. (Ai,...,A„) = or 

H'n) ~ •s^(Ai>...> A„). 

We shall for the most part be concerned with arrangements of the 

first n positive integers. If {vi,...,v„) = J!/(1 . n) and {ki,...,kn) 

= ji/{l,...,n), then clearly = s/{l,...,n). We have the 

following result. 

Thbobbm 1.1.1. (i) Let (v^,..., v„) vary over all arrangements of 
(1,..., n), and let {k^,..., be a fixed arrangement of »)• Then 

varies over all arrangements of (l,...,n). 

(ii) Let {vi,...,v„) vary over all arrangements of and let 

(/ii, be a fixed arrangement of (1,...,%). The arrangement 
(Ai,...,A„), defined by the conditions 

>'A, == IH . = /*n. 

then varies over aU arrangements of 

This theorem is almost obvious. To prove (i), suppose that for 
two different choices of (vi,...,v„)—say and (Pi,...,Pn )— 

{vin,...,vii^) is the same arrangement, i.e. 

and so a*., = ..., 

These relations are, in fact, the same as 

®1 ~ Pv ®n ~ Pn> ^ 

although they are stated in a different order. The two arrange¬ 
ments are thus identical, contrary to hypothesis. It therefore 
follows that, as (vj,..., v„) varies over the«! arrangements of (1,..., n), 

(vfc,. v^) also vai^s, without repetition, over arrangements of 

(l,...,n). Hence (vjj.,,...,v*^) varies, in fact, over all the n\ arrange¬ 
ments. 

The second part of the theorem is established by the same type 
of argument. Suppose that for two different choices of (v^,..., v„)— 
say («! a„) and (jS^ j8„)—(Ai,...,A„) is the same arrangement, 

' «A, — th. = “A, = /*n = Pk’ 

t We avoid the familiar term * permutation’ since this will be used in a some- 
what different sense in Chapter IX. 



ABRANaEMENTS AND THE c-SYMBOE 


3 


I. § 1.1 

Then («!,...,«„)= contrary to hypothesis, and the 

assertion follows easily. 

1.1.2. Definition 1.1.2. For aU real values of x the function 
Bgax (read: signuma;) is defined as 

r 1 (a:>0) 

sgna: = | 0 (* = 0) 

1-1 (*<0). 

Exbbcise 1.1.1. Show that 

sgna:.sgny = Bgaxy, 

and deduce that 

sgna:i.8gna:j...sgna:i. = sgn(a:ia:,...a;i.). 

Definition 1.1.3. 

(i) .AJ = sgn XT (\—^)-t 

l<r<8<n 

(ii) ^ e(Ai,..., A„). e(/ti,..., fij. 

Exercise L1.2. Show that if Ai < ... <C A^, then €(Ai,...»An) = 1. Also 
show that if any two A’s are equal, then 6(Ai,...,A») = 0. 

Exercise 1.1.3. The interchange of two A’s in (Ai,...,A,^) is called a 
transposition. Show that, if (Ai,...,A,i) = j/(l,...,n), then it is possible to 
obtain (Ai,...,An) from (l,...,n) by a succession of transpositions. Show, 
furthermore, that if this process can be carried out by s transpositions, then 

€(Ai,...,AJ = (-1)*. 

Deduce that, if the same process can also be carried out by s' transpositions, 
then 8 and s' are either both even or both odd. 

Theorem 1.1.2. If (Ai,..., A„), /*„)> K) 

arrangements of then 

H'fil 

We may express this identity by saying that if (Ai,...,A^) and 
fi^) are subjected to the same derangement, then the value of 



t Empty products are, as usual, defined to have the value 1. This implies, in 
particular, that for n = 1 every c-symbol is equal to 1. 
t Definition 1.1.3 implies, of course, that 

. h.) ■= (At,-At,). 



4 


DETEBMINANTS 


I. § 1.1 


remains unaltered. To prove this we observe that 

{^kf H'k^ ~ \){/*« /*r)> 

where r = min(&^, i:^), s = m&x(ki,kj). (1.1.2) 

Now if r, « (such that 1 < r < s < ») are given, then there exist 
unique integers i, j (such that 1 < i < j < ») satisfying (1.1.2). 
Thus there is a biunique correspondence (i.e. a one-one correspon¬ 
dence) between the pairs kj and the pairs r, s. Hence, by (1.1.1), 

S (^*} ~ TT (\ /*!•)• 

Therefore, by Exercise 1.1.1, 
sgn n (^Aj—^fc,).sgn IT il^k,—l^kt) 

= sgn n (^«-\).sgn IT (M*-/*r). 

l<r<8<n l<r<s<n 

i e. = JK->K\ 

\(^ki>"’y H'kJ Mn/ 

Thbobbm 1.1.3. Let 1 <s ^n. Then 

€(l,...,r—l,«,r-|-l,...,s—l,r,s+l,...,n) = —1. 

The expression on the left-hand side is, of course, simply 
€(1,2,..,,«) with r and s interchanged. Denoting this expression 
by e(Ai.A,t), we observe that in the product 

IT Q^-\) 

there are precisely 2(a—r—1)-1-1 = 25—2r—1 negative factors, 
namely, (r+l)-s. (r+2)-s, ..., ( 5 -l)- 5 , ^ 

r—(r-fl), r—(r-l-2), ..., r—(s—1), 
r—s. 

Hence, e(Ai,...,A,j) = (—i)2»-2»'-i = — 1, as asserted. 


The results obtained so far are sufficient for the discussion in 
§ 1.2 and § 1.3. The proof of Laplace’s expansion theorem in 
§ 1.4, however, presupposes a further identity. 

Thbobbm 1.1.4.7/(ri,...,r„) = j3^(1,...,to), (5i,...,5j = .fl7(l,...,n), 
and 1 ^ k < n, then 

= (—l)n+...+rt+ai+-.+^ .*'A!\ ^ 

W,...,*n/ W.**/ \^k+V'’'>^n} 



ARRANGEMENTS AND THE c-SYMBOL 


S 


I. § 1.1 

By Exercise 1.1.1 we have 

= sgn XI (rj-rd.Bgn IJ (r^-r<).sgn IJ (r^-r<) 

k+l^r<j<n ■' 

= .rfc).6(rft+i,...,r„).(-l)«'x+™+*'*, (1.1.3) 

where, for 1 < i < 1:, denotes the number of numbers among 
r*+i,...,r„ which are smaller than r^. 

Let be defined by the relations 

{ri,-,rk) = J/(ri,...,rfc), r[ < ... < r^, 
and denote by vl (1 ^ i ^ i) the number of numbers among 
r*+i,..., r„ which are smaller than r^. Then 

/ f 1 f f Ck t f 1 

Vi = ri—1, V2 = rj—2, ..., v* = r*—A:, 

Vi+...+Vfc = vi+...4-4 = ri+...+r*—P(A:+1), 

and hence, by (1.1.3), 

e(rv-,r„) = (-l)’’‘+"+’'*-*'^*+^>€(ri,...,r*).€{r*+i,...,r^^ 
Similarly 

and the theorem now follows at once by Definition 1.1,3 (ii). 

1.2. Elementary properties of determinants 
1.2.1. We shall now be concerned with the study of certain 
properties of square arrays of (real or complex) numbers. A 
typical array is „ „ 

^ «11 Oi2 . • . Oi„ 

®21 ®22 • • • ®2n M 0 1\ 


®nl ®n2 • • • 

Definition 1.2.1. The numbers a^j {i,j — !,...,») are the 
ELEMENTS of the array (1.2.1). The elements 

®il> ®i2>”"> 

constitute the i-th bow, and the dements 

®2I>**.* 

constitute the j-th column of the array. The elements 

®22>*”> ®nn 

constitute the diagonal of the array, and are caUed the diagonal 
ELEMENTS. 




6 


DETERMINANTS 


I, §1.2 

The double sufiix notation used in (1.2.1) is particularly appro¬ 
priate since the two suffixes of an element specify completely its 
position in the array. We shall reserve the first suffix for the row 
and the second for the column, so that denotes the element 
standing in the ith row and jth column of the array (1.2.1). 

With each square array we associate a certain number known 
as its determinant. 


Definition 1.2.2. The determinant of the array (1.2.1) ia the 
number ^ (1-2.2) 

where the aurmnaticm extends over all the n\ arrangements (Ai,...,A,J 
of (l,...,n).t This determinant is denoted by 


ail 

®12 

. 


®21 

a22 

• 

. 0,2n 

am 

a„2 . 

, 

• ®nn 


or, more briefly, by \a^j\^. 

Determinants were first written in the form (1.2.3), though 
without the use of double suffixes, by Cayley in 1841. In practice, 
we often use a single letter, siich as D, to denote a determinant. 

The determinant (1.2.3) associated with the array (1.2.1) is 
plainly a polynomial, of degree n, in the n^ elements of the array. 

The determinant of the array consisting of the single element 
is, of course, equal to Un. Further, we have 




%2 


»21 

®22 

Oil 

®12 

®13 

021 

®22 

^28 

O31 

®32 

®83 


— ^(Ij 2)aiia224"€(2, 1)C3^X2^21 — 

= ^(1,2, 2)ai^a2%a2^-\~e{\, 3,2)axiGf23®32~l“ 

+€(2, 1, 3)ax2^21^33 

+^(2,3,1)^12 ®23®3i’+’^(®> 2)ax3a2ia32+ 

+ €(3, 2, 1)^X3 ^22 <^31 

= axi6&22%3—%1®23®32 ®12^21®33“f’ 

+0^x2 ®23 ®31+^13 ^21 ®^82*^®13 ^22 ®31' 


We observe that each term of the expression (1.2.2) for the 
determinant contains one element from each row and one 
element from each column of the array (1.2.1). Henoe, if any array 


t The same convention will be observed whenever a symbol such as (Aj,..., A,|) 
appears under the summation sign. 




I, §1.2 ELEMENTARY PROPERTIES OF DETERMINANTS 7 

contains a row or a column consisting entirely of zeros, its deter¬ 
minant is equal to 0. 

A determinant is a number associated with a square array. 
However, it is customary to use the term ‘determinant* for the 
array itself as well as for this number. This usage is ambiguous but 
convenient, and we shall adopt it since it will always be clear from 
the context whether we refer to the array or to the value of the 
determinant associated with it. In view of this convention we may 
speak, for instance, about the elements, rows, and columns of a 
determinant. The determinant (1.2.3) will be called an n-rowed 
deiermiwmty or a determinant of order n, 

1.2.2. Definition 1.2.2 suffers from a lack of symmetry between 
the row suffixes and the column suffixes. For the row suffixes 
appearing in every term of the sum (1.2.2) are fixed as l,...,w, 
whereas the column suffixes yary from term to term. The following 
theorem shows, however, that this lack of symmetry is only 
apparent. 


Theorem 1.2.1. Let D be the value of the determinant (1.2.3). 


(i) If (Ai,...,A^) ia any fixed arrangement of (l,...,ri), then 

2 •£*. 

(ii) If (i[Xi,...,jLt^) is any fixed arrangement of (l,...,w), then 


In view of Definition 1.2.2 we have 

2 ‘C. 

Let the same derangement which changes into the fixed 

, arrangement (Ai,...,A„) change (vi.—.Vn) into Then 

and, by Theorem 1.1.2 (p. 3), 



8 


DETERMINANTS 


I, § 1.2 


Hence, by Theorem 1.1.1 (i) (p. 2), 

^=2 ia . 

. 

and the first part of the theorem is therefore proved. 

To prove the second part we again start from (1.2.4). Let the 
same derangement which changes (vi,..., v^) into the fixed arrange¬ 
ment change (l,...,n) into (Ai,...,AJ. Then, by Theorem 

1 . 1 . 2 , . 

2 4j .^ 

(V. v.'i 


and also vx^ == 

Hence, by Theorem 1.1.1 (ii). 


n. = Mn- 


^=2 


as asserted. 


Theorem 1.2.2. The value of a determinant remains unaltered 
when the rows and columns are interchanged, i.e. 


^11 

%2 

• ^In 


“u 

a^i 

‘ «nl 

«21 

^22 

• ®2» 

= 

®12 

0^22 

* ®?i2 

®nl 

0'n2 • 

' • ^nn 


«1» 

®2n 

• ^nn 


Write = Og,. (r, a = 1,..., to). We have to show that 
|oy|„ = Now, by Theorem 1.2.1 (ii) and Definition 1.2.2, 




— 2. 

(Al,...,An) 

and the theorem is therefore proved. 


Exercise 1.2.1. Give a direct verification of Theorem 1.2.2 for 2-rowed 
and 3-rowed determinants. 

Theorem 1.2.2 shows that there is symmetry between the rows 
and columns of a determinant. Hence every statement proved 
about the rows of a determinant is equally valid for columns, and 
conversely. 

Theorem 1.2.3. If two rows (or columns) of a determinant D are 
interchanged^ then the resulting determinant has the value —D, 





1,51.2 ELEMENTARY PROPERTIES OF DETERMINANTS 9 

Let 1 ^ r < s < TO, and denote by D' — |a^^|„ the determinant 
obtained by interchanging the rth and ath rows in Z) = |%ln' Then 

foy (*#»■; 

«« = {i = r) 

U,^ (i = s). 

Hence, by Definition 1.2.2, 


— 2, “jiA, 


But, by Theorem 1.1.3 (p. 4), €(l,...,5,...,r,...,n) == — 1, and so 


D' = 



, r, 
> 


•> ^n/ 


Hence, by Theorem 1.2.1 (i), D' = —D. 


Corollary. // two rows (or two columns) of a determinant are 
identical, then the determinant vanishes. 

Let Z) be a determinant with two identical rows, and denote by 
D' the determinant obtained from D by interchanging these two 
rows. Then obviously D' = D. But, by Theorem 1.2.3, D' = —D, 

and therefore D == 0. 

• 

Exercise 1.2.2. Let r^ < ... < rj^. Show that, if the rows with suffixes 
of a determinant D are moved into 1st, 2nd,..., A;th place respec¬ 
tively while the relative order of the remaining rows stays unchanged, 
then the resulting determinant is equal to 


( _ 1 )r,+r,+...+ri-Jfc(fc+l)X). 


^hen every element of a particular row or column of a deter¬ 
minant is multiplied by a constant h, we say that the row or 
colunm in question is multiplied by h. 

Theorem 1.2.4. Ifarow {or column) qfadetermiruint is multiplied 
by a emstant k, then the value of the determinant is also multiplied 
by k. 



10 DETERMINANTS I § 1*2 

Let D == be a given determinant and let /)' be obtained 
from it by multiplying the rth row by k. Then 




ka, 


n 


a. 


nl 


♦in 


. ka^ 




(Ax)*>*»An) 


The next theorem provides a method for expressing any deter¬ 
minant as a sum of two determinants. 

Theorem 1.2.5. 


♦ii 




In 


*nl 


®nr'4"®nr • • ^n 


♦11 


♦in 


®nl • • 


+ 


+ 


"u • • 

®lr • 

• ®ln 

®»1 • • 

^nr • 

. a*.- 

• nn 


Denoting the determinant on the left-hand side by |6y|„, we 

W+o<, (i = »-)- 

Hence, by Theorem 1.2.1 (ii) (p. 7), 

(Ai,...,Afi) 

= 2. (®Arr+0^r)—®\,» 

~ X ^(^l’**’»^)®AxV*®Arr«-®Ann+ * 

(Ax .An) 

+ «(^l>-.\i)aAil—0^r-»A,n 


=: 

Oil 

• • ®lr • • 

«ln 

+ 

Oil 


®ln 


®nl 

• • ®nr • • 

^nn 


®»1 

• a ^n/f • • 











I, §1,2 ELEMENTARY PROPERTIES OF DETERMINANTS 11 
Exercise 1.2.3. State the analogous result for rows. 

A useful corollary to Theorem 1.2.5 can now be easily proved by 
induction. It enables us to express a determinant, each of whose 
elements is the sum of h terms, as the sum of A" determinants. 

Corollary. 



• ®in4- — 




h 

= y 




ku.!7jcn=^I 

• • • 

• • • 

• ^nn 


Theorem 1.2.6. The value of a determinant remains unchanged 
if to any row {or column) is added any multiple of another row {or 
column). 


By saying that the ^th row of a determinant is added to the rth 
row we mean, of course, that every element of the ath row is added 
to the corresponding element of the rth row. Similar terminology 
is used for columns. 

Let D = and suppose that D' denotes the determinant 

obtained when k times the 5th row is added to the rth row in D. 
Assuming that r < 5 we have 





• • • 

• 

' D' = 


. . . 



^8l 

• • • ^sn 



<^nl 

• • • ^nn 


Hence, by Theorem 1.2.5 (as applied to rows), 



• • ^In 



• • ®ln 

®rl 

• • ®rn 

+ 







• • ®sn 

. 

• • • 


• 

• • • 

®nl • 

• • ®nn 


^nl 



U == 

















12 DETEBMIKAKTS I, S1-2 

and 80 , by Theorem 1.2.4 and the corollary to Theorem 1.2.3, 


D' = D+h . 


Oil 





0*1 




®an 

«*1 





®ral 




^nn 


1.3. Multiplication of determinants 

We shall next prove that it is always possible to express the 
product of two determinants of the same order » as a determinant 
of order n. 

Theorem 1.3.1. (Multiplication theorem for determinants) 
Let A = |ay|„ and B = be given determinants, and write 
G = \Cij\n, where 


Then 
We have 
C= ' 


Cr. = (»•,«= 1. n). 

1 = 1 

AB=C. 


(1.3.1) 


^n)^lAi***^nAn 

(AiftoyAft) 


2. 2 2/^n/Ltn^A*nAn) 

(Ai,...,An) Vl = l / = l 

= 2 ••• 2 . 2. ^(^l>”*»^n)^ftlAi*'-^/XnAn* (1-3.2) 

jLti = l /*n=»l (Ai,...,An) 

By Definition 1.2.2 the inner sum in (1.3.2) is equal to 



• 

• ^/xin 


• 

* 


Hence, if any two /x’s are equal, then, by the corollary to Theorem 
1.2.3, the inner sum in (1.3.2) vanishes. It follows that in the n-fold 
summation in (1.3.2) we can omit all sets of /I’s which contain at , 
least two equal numbers. The summation then reduces to a simple 
summation over n\ arrangements (/^i,..., /x-^), and we therefore have 

(Ax»...fAn) 






13 


I, §1.3 MULTIPLICATION OF DETERMINANTS 
Hence, by Theorem 1.2.1 (i) (p. 7), 


o— 2 . 

(Ml.Mn) 


bn 

b. 


'In 


nl 


• • 


®11 

• 

• 

• 



bn 

• 

• 

• 

bxn 

®nl 

. 

. 

• 

^nn 


^nl 

. 

. 

. 

^nn 


=^AB. 


The theorem just proved shows how we may form a determinant 
which is equal to the product of two given determinants A and B. 
We have, in fact, AB = C, where the element standing in the rth 
row and 5th column of C is obtained by multiplying together the 
corresponding elements in the rth row of A and the 5th column of 
B and adding the products thus obtained. The determinant C 
constructed in this way may be said to have been obtained by 
multiplying A and B ‘rows by columns’. Now, by Theorem 1.2.2, 
the values of A and B are unaltered if rows and columns in either 
determinant or in both determinants are interchanged. Hence 
we can equally well form the product AB hy carrying out the 
multiplication ‘rows by rows’, or ‘columns by columns’, or ‘columl^s 
by rows’. These conclusions are expressed in the next theorem. 

Theorem 1.3.2. The equality (1.3.1) continues to hold if the 
determinant C = is defined by any one of the following sets of 
relations: ^ 

= 2 (^nbai (r.s = 

Crs = % f^irbu « = 1.-. »); 


= .2 ^irbsi (r, S = 1,..., n). 

1 = 1 

An interesting application of Theorem 1.3.2 will be given in 
§ 1.4.1 (p. 19). 

ij^ERCiSE 1.3.1. Use the definition of a determinant to show that 


Uii . 

• • 9lm 

0 . . 

. 0 

= 

ail • • 


• 

• • ^mm 

0 . . 

. 0 


<^ml * • 

• ®mm 

0 . 

. . 0 

1 . . 

. 0 




0 . 

. . 0 

/ 

/ 

/ 

/ 

1 

t 

1 

. o 














14 DETERMINANTS 

Deduce, by means of Theorem 1.3.1, that 


I. §1.3 


Oil . 

• ®im 

0 . 



0 

• • • 

* • 

• ®inw 

0 . 



0 

0 . . 

0 

6ii . 



^in 

0 . . 

0 

^ni • 

. 




Oil 

. Oifp 6ii 

• 

bxn 


®mf» ^nl 

. 

• t»n 


1.4. Expansion theorems 

1.4.1. We have already obtained a number of results which can 
be used in the evaluation of determinants. A procedure that is 
still more effective for this purpose consists in expressing a deter¬ 
minant in terms of other determinants of lower order. The object 
of the present section is to develop such a procedure. 

Definition 1.4.1. The copactor of the element a„ in the 
determinant ^ „ 

• • • ®ln 

D =. 

®nl • • • 

is defined as 

A„ = (r,8 = 

where is the determinant of order n— 1 obtained when the r-th row 
and s4h column are deleted from D. 

For example, if 

ail ®i2 ®i8 

D = a2i .^22 ®28 > 

Ctsi Ujg a33 

then Ai= (-1)^+^ = a22®38-a28«82 

a32 ®33 

3^d .423 = ®12%1—%1^82* ' 

®31 ®82 

Exeboise 1.4.1. Suppose that \b%j\n is the determinant obtained when 
two adjacent rows (or columns) of a determinant interchanged. 

Show that if the element of \(iij\n becomes the element bp,, of then 
Bpp = —•4^,, where 4,, denotes the cofactor of a„ in |ay|„ and Bf„ the co- 
factor of in 











EXPANSION THEOREMS 


15 


I.§M 

Theorem 1.4.1. (Expansion of determinants in terms of rows 
and columns) 

If the cofactor of ap^ in D = \a^^\n is denoted by Ap^y then 


.2 ^rk-^rk - ^ 

{r = 1 ... 


(1.4.1) 

n 

2 ^kr^kr “ ^ 

{r = 1 ... 


(1.4.2) 


This theorem states, in fact, that we may obtain the value of a 
determinant by multiplying the elements of any one row or column 
by their cofactors and adding the products thus formed. The 
identity (1.4.1) is known as the expansion of the determinant D 
in terms of the elements of the rth row, or simply as the expansion 
of D in terms of the rth row. Similarly, (1.4.2) is known as the 
expansion of D in terms of the rth column. In view of Theorem 
1.2.2 (p. 8 ) it is, of course, sufficient to prove (1.4.1). 

We begin by showing that 

1 0 • • • 0 — ^22 ... ^271 

^21 ^22 • * • ^ 2n . 

. ^n2 • • • ^nn 

^nl ^n2 • • • ^nn (1.4.3) 

Let J5, B' denote the values of the determinants on the left-hand 
side and the right-hand side respectively of (1.4.3). We write 
• B = so that = 1, 6^2 = — = Then 

== 2 ^)^lAi^2At*”^nAn 

(Ai,...,An)«J^( 2 ,....n) 

But, for any arrangement (A 2 ,...,A^) of ( 2 ,...,n), we clearly have 


€(1, A2>*«.j A^) — €(A2y*.*) A^). 


Hence B= T .A„) 62 A.- 6 «A. =-B', 

(A.. 


aa asserted. 




16 


DETERMINANTS 


I.S1.4 


Next, by Theorems 1.2.4 and 1.2.6 (pp. 9-10), we have 



ail 

• ®ln 

D = 

®rl 




• ^nn 



ail 


^in 

II 

0 . . 

. 0 10. . . 

0 




®nn 




where is the determinant obtained from D when the Jfcth element 
in the rth row is replaced by 1 and all other elements in the rth row 
are replaced by 0. By repeated application of Theorem 1.2.3 (p. 8) 
we obtain 


0 

. . 1 . . 

0 

Oil 

aijc . . 

. ai„ 




®r+l,l 

• • ^r+l.k • 

• ^r+l,n 

^ni 

. . UnJe 

• ®nn 



1 

0 

0 

0 . . 

0 



«ll 


®i,)fc+i • • • 


= (_l)(r-l)+(fc-l) 

^M,1C 



®r-l,fc+l • 



®r+l,fe 

®r+l,l 


®r+l,fc+i • 

®r+l,n 

i 


®nl 

Un,fe-1 

®n,fc+l 

®nn 

Hence, by (1.4.3), 


= (-l)’-+*'Z),ft, 




where denotes the determinant obtained when the rth row and 
A;th column are deleted from D. Hence, by (1.4.4), 

and the theorem is proved. 

























EXPANSION THEOREMS 


17 


I, § 1.4 

We now possess a practical method for evaluating determinants. 
This consists in first using Theorem 1.2.6 (p. 11) to introduce a 
number of zeros into some row or column, and then expanding the 
determinant in terms of that row or column. Consider, for example, 
the determinant 


Z) = 


9 7 3 -9 

6 3 6 -4 

15 8 7 —7 

—5—64 2 


Adding the last column to each of the first three we have 


i)== 


0 —2 —6 -9 

2—1 2-4 

8 10—7 

-3-4 6 2 


Next, we add once, twice, and four times the third row to the 
second row, first row, and fourthr row respectively. This leads to 
the expression 


D 


Expanding D in terms of the second column we obtain 


16 

0 

—6 

-23 

10 

0 

2 

— 11 

8 

1 

0 

—7 

29 

0 

6 

—26 





16 

—6 

—23 



D = - 

10 

2 

-11 

• 

• 


29 

6 

—26 


and we can continue the process of reduction in a similar manner 
until D is evaluated. 


Exercise 1.4.2. Show that D = —532. 

The expansion theorem (Theorem 1.4.1) can be used to show 
that the value of the Vandermonde determiimnt 




af-* . 

. 


1 





. aj 

1 


n 

<-2 . 

• 

• ®n 

1 


-D = II (Oi-a^). 


is given by 


(1.4.5) 




DETERMINANTS 


18 


I. I 1.4 


The assertion is obviously true for » = 2. We shall assume that it 
is true for n— 1, where » > 3, and deduce that it is true for n. We 
may clearly assume that all the a’s are distinct, for otherwise 
(1.4.6) is true trivially. Consider the determinant 



a.n-2 ^ 

, 

, 

X 

1 


^n-2 

• 

• 

ag 

1 






1 


Expanding it in terms of the first row, we see that it is a polynomial 
in X, B&yfix), of degree not greater than n—1. Moreover 

/(Oa) = ... =/(a„) = 0, 

and so f{x) is divisible by each of the (distinct) factors 
a;—aj,...,*—o„. Thus 

f(x) = K{x—az)...{x-aJ; 

and here K is independent of x, as may be seen by comparing the 
degrees of the two sides of the equation. Now, by (1.4.1), the 
coefficient of x^~^ inf{x) is equal to 

1 

1 

which, by the induction hypothesis, is equal to 
This, then, is the value of K; and we have 


f{x) == JJ 

We now complete the proof of (1.4.6) by substituting x = a^. 

The result just obtained enables us to derive identities for discriminants 
of algebraic equations. The discriminant A of the equation 

== 0, (1.4.6) 

whose roots are is defined as 

It follows that A = 0 if and only if (1.4.8) has at least two equal roots. To 
express A in terms of the coefficients of (1.4.6) wo observe that, in view of 
(1.4.6), 


A = 



. 

..0x1 


Bp~. 

. . 0„ 1 


... Ox I 

es-* b/I 





I, § 1.4 EXPANSION THEOREMS 19 

Carrying out tho multiplication columns by columns, we have 



^2n--2 

^2n-3 

• 

• 


A = 

^2tt-8 

^2n-4 • 

• 

• 

^n-.2 



^n-2 

• 

• 



where 3f — (r = 0,1,2,...). Using Newton’s formulaef we can 

express « 2 n -9 terms of the coefficients a^,..., a„ of (1.4.6), and hence 

obtain A in the desired form. 

Consider, for example, the cubic equation = 0. Here 


A = 


and it is easily verified that 


^4 ^2 

^3 » 

^2 


5o = 3, = 0, 52 = ““2p, 53 = —3g, 54 = 2^2. 


Hence A = — (4p®-f-275*), and thus at least two roots of x^-\-px-\-q = 0 are 
equal if and only if 4:p^-\-21q^ = 0. 


Exercise 1.4.3. Show, by the method indicated above, that the dis¬ 
criminant of the quadratic equation == 0 is /x®—4i/. 

We now resume our discussion of the general theory of determi¬ 
nants. 


Theobbm 1.4.2. With the same notation as in Theorem 1.4.1 we 
have for r ^s, „ 

2 ^rk-^ak ~ 

*=1 

2 ^kr^ka — 

/c=l 

In other words, if each element of a row (or column) is multiplied 
by the cofactor of the corresponding element of another fixed row 
(or column), then the sum of the n products thus formed is equal to 
zero. ThisresultisaneasyconsequenceofTheorem 1.4.1. Weneed, 
of course, prove only the first of the two stated identities. 

If X>' = \a\j\n denotes the determinant obtained from D — |ay|„ 
when the «th row is replaced by the rth row, then 


• • 


% 


\«ry 


(i # s) 
(i = a). 


denoting 

have 


by A'ij the oofactor of the element ajy in X>', we clearly 

Agjg ~ -^ak ~ !,...,%)• 


t See Burnside cmd Panton, The Theory of Equations (10th edition), i. 166-7, or 
Perron, 12, i. 160-1. 




20 


DETERMINANTS 


I, S 1.4 


Hence, by (1.4.1) (p. 16), 

.D' = 2 = 2 '^rk-^sk' 

*=1 

But the rth row and sth row of D' are identical, and so D' = 0. 
This completes the proof. 


It is often convenient to combine Theorems 1.4.1 and 1.4.2 into 
a single statement. For this purpose we need a new and most useful 
notation. 


Definition 1.4.2. 
DELTA, is defined as 


The symbol 8,g, known as the Kbonecker 



{r = s) 
(r ^ s). 


With the aid of the Kronecker delta Theorems 1.4.1 and 1.4.2 
can be combined in the following single theorem. 


Theorem 1.4.3. If denotes the cofactor of a^g in the deter¬ 
minant D = ihen 


n 

^f’'rk-^sk — ^r$^ 
n 

,2 



1.4.2. Our next object is to obtain a generalization of the 
Expansion Theorem 1.4.1. We require some preliminary definitions. 

Definition 1.4.3. A h-rowed minor of an n-rowed determinant 
D is any Jc-rowed determinant obtained when n—k rows and n—h • 
columns are deleted from D. 

Alternatively, we may say that a i-rowed minor of D is obtained 
by retaining, with their relative order unchanged, only the elements 
common to k specified rows and k specified columns. 

For instance, the determinant obtained from the »-rowed 
determinant D by deletion of the ith row and jth column, is an 
(n—l)-rowed minor of D. Each element of D is, of course, a 
1-rowed minor of D. 


Exsboise 1.4.4. Let 1 < 1; < n, and suppose that all ik-rowed minors of 
a given n-rowed determinant D vanish. Show that all ()b-|- l)-rowed minors 
of D vanish also. 

The l;-rowed minor obtained from D by retaining only the 



EXPANSION THEOREMS 


21 


1 . 11.4 

elements belonging to rows with suffixes r-^ .r^j, and columns with 

suffixes Si,...»«*! will be denoted by 


D{r^,...,r^ 



Thus, for example, if 




®12 

%3 

D = <l21 

^22 

®23 

®31 

®32 

®33 


then X)(l,3|2,3)= “i3 

®32 ®33 


Definition 1.4.4. The cofactor (or algebraic complement) 
i)(ri,..., r^. 1 of the minor D(r^^..., | Sj,..., in a determinant 
D is defined as 

where ^jfc+iv>^n n—k numbers among other than 

^ivj ^A;+ivj n—fc numbers among 1,..., n other than 

We note that for A; = 1 this definition reduces to that of a cofactor 
of an element (Definition 1.4.1, p. 14). If jfc = i.e. if a minor 
coincides with the entire determinant, it is convenient to define its 
cofactor as 1. 

Consider, by way of illustration, the 4-rowed determinant 
D = la^y| 4 . Here 

Z)(2,31 2,4) = “24 ^ 

®32 ^34 

and 

15(2,3 12,4) = (- l)2+3+2+«i)(l, 4 I 1, 3) = - ““ 

«41 «43 

Theorem 1.4.4. (Laplace’s expansion theorem) 

Let D be an n-rowed determinant, and let be integers swh 

that 1 ^ k <n and 1 < < ... < r* < n. Then 

D= 2 25(ri,..., r* I u^)I>{r^,..., r„ \ «*). 

* Tfiis theorem (which was obtained, in essence, by Laplace in 
1772) furnishes us with an expansion of the determinant D in terms 
*of k specified rows, namely, the rows with suffixes rj,..., r^. We form 
all possible l;-rowed minors of D involving all these rows and 

multiply each of them by its cofactor; the sum of the products 



DETERMINANTS 


I. i 1.4 


is then equal to D. An analogous expansion applies, of course, to 
oolunms. It should be noted that for fc = 1 Theorem 1.4.4 reduces 
to the identity (1.4.1) on p. 16. 

To prove the theorem, let the numbers rA!+i>—>*■» defined by 
the requirements 


1 < < ••• < (»’l. »'n) = ^(1,...,«). 

Then, by Theorems 1.2.1 (i) (p. 7) and 1.1.4 (p. 4) we have 


(1-4.7) 


- 2 (—l)ri+...+r*+8i+...+«* gj 

.n) 


Now we can clearly obtain all arrangements of (l,...,n) 

—^and each arrangement exactly once—^by separating the numbers 
1 ,..., n in all possible ways into a set of k and a set of n—k numbers, 
and letting vary over all arrangements of the first and 

>^n) arrangements of the second set. Thus the 

condition = eQ/(l,...,n) below the summation sign in 

(1.4.7) can be replaced by the conditions 


{«!,...,«„) = J3^(l,...,»); (1.4.8) 

«i < ... < «*; Mfc+i < ... < (1.4.9) 

(«i.«*) = ^(«i,...,«*); («fc+i.-,0 = j/(m*+i,...,«„). 


Indicating by an accent that the summation is to be t^tken over the 
integers satisfying (1,4.8) and (1.4.9), we therefore have 


J5 = (—lJri+...+r*+Wi+...+M* 2 € 

(«!.....»*) (Ml,....w*) 

X 2 



= 2'(_1 )ri+...+r*+tii+...+u* 


®fi«i 


®r*wi 






rkUk 


X 


X 


^n+iuk+i 


^rnUk+l 


^rk+iUn 


^rnUn 

« 






I, § 1.4 


EXPANSION THEOREMS 


23 


= 1 '{- ir‘+-"+'»+«‘+-+“*D(ri,..., ffc I «*) X 

= TD{ri,...,rk I | 

= 2 i)(ri,...,r* I,«fc)x 

where the inner sum is extended over all integers 
satisfjdng (1.4.8) and (1.4.9). Now the integers are 

clearly determined uniquely for each set of Hence the 

value of the inner sum is equal to 1, and the theorem is proved. 

The natural way in which products of minors and their cofactors 
occur in the expansion of a determinant can be made intuitively 
clear as follows. To expand an %-rowed determinant in terms of the 
rows with suffixes ..., r^^., we write every element in each of these 
rows in the form a^j+0 and every element in each of the remain¬ 
ing rows in the form Using the corollary to Theorem 1.2.5 

(p. 11) we then obtain the given determinant as a sum of 2" 
determinants. Each of these either vanishes or else may be 
expressed (by virtue of Exercise 1.3.1, p. 13, and after a prelimi¬ 
nary rearrangement of rows and columns) as a product of a ik-rowed 
minor and its cofactor. The reader will find it helpful actually to 
carry out the procedure described here, say for the case w = 4, 
k = 2,ri= l,r2 = 3. 

As an illustration of the use of Laplace’s expansion we shall 
evaluate the determinant 

0 0 (3^15 

0 0 ^23 ®24 ^ 

D =r 0 0 a33 0 0 

0 ( 1^2 ®43 ®44 ®45 

®61 ®62 ®63 ^54 ^66 

*by expanding it in terms of the first three rows. The only 3-rowed 
minor which involves these three rows and does not necessarily 
* vanish is 

®13 ®14 ®16 

Z)(l, 2,3 I 3,4,5) = ^23 ®24 ^ * 

®33 



24 


DETERMINANTS 


I, § 1.4 


Expanding this minor in terms of the last column, we obtain 

^16 ® 24 ® 33 - 


®23 

Gtoft 0 


i)(l,2,313,4,6)=:ai5 
Furthermore 

5(1,2,3 I 3,4,5) = (~-1)1+2+3+3+4+bx>(4^ 5 12) 

_ 0 ^42 

«61 «52 

and so, by Theorem 1.4.4, 

D = Z)(l, 2, 3 I 3,4,5)5(1,2,3 I 3,4,6) = ^^ 5 < 124 ^ 33 ^ 42 %i* 


= - fl-4 O di 


' 42 “' 51 > 


1.5. Jacobi’s theorem 

With every determinant may be associated a second determinant 
of the same order whose elements are the cofactors of the elements 
of the first. We propose now to investigate the relation between 
two such determinants. 

Definition 1.5.1. If denotes the cofactor of in D = 
then D* = known as the ad jugate (determinant) of D, 

> Our object is to express D* in terms of D and, more generally, 
to establish the relation between corresponding minors in D and D*. 
In discussing these questions we shall require an important general 
principle concerning polynomials in several variables. We recall 
that two polynomials, say/(a:i,...,a:,yj) and 
to be identically equal ii for all values of 
0 ?!,..., x^. Again, the two polynomials are said to formally equal 
if the corresponding coefficients in / and g are equal. It is well 
known that identity and formal equality imply eacli other. We 
shall express this relation between the polynomials / and g by 
writing / = g. 

Theorem 1.5.1. Let /, gr, h be polynomials in m variables. If 
fg = fk and / 0, then g = h. 

When m = 1 this is a well known elementary result. For the 
proof of the theorem for m > 1 we must refer the reader elsewhere.| 

Theorem 1.5.2. If D is an n-rowed determinant and D* its 
adjugate, then 

This formula was discovered by Cauchy in 1812. To prove it, we 
write D = D* = and form the product DD* rows by 

t See, for example, van der Waerden, Modem Algebra (English edition), i. 47. 



i.Si-s 

rows. Thus 


JACOBI’S THEOREM 


25 


DD* — — |Sf^-D|„ — 

D 0 ... 0 

0 D ... 0 


O 

o 

b 


and therefore DD* = D^. (1.5.1) 

If now Z) 7 ^ 0, then, dividing both sides of (1.5.1) by 2), we obtain 
the required result. If, however, 2) = 0 this obvious device fails, 
and we have recourse to Theorem 1.5.1. 

Let us regard 2) as a polynomial in its elements. The adjugate 
determinant 25* is then a polynomial in the same elements, and 
(1.5.1) is a polynomial identity. But D is not an identically vanish¬ 
ing polynomial and so, by (1.5.1) and Theorem 1.5.1 (with/ = 2), 
g = 2)*, h = D^~^) we obtain the required result.f 

Our next result—the main result of the present section—^was 
discovered by Jacobi in 1833. 

Theorem 1.5.3. (Jacobi’s theorem) 

If M is a k-rowed minor of a determinant D, M* the corresponding 
minor of the adjugate determinant 2)*, and M the cofactor of M in 2), 

M* = (1.6.2) 

Before proving this formula we point out a few special cases. 
The order of D is, as usual, denoted by n, (i) If Z — 1, then (1.5.2) 
simply reduces to the definition of cofactors of elements of a 
determinant, (ii) If fc = n, then (1.5.2) reduces to Theorem 1.5.2. 

(iii) For k = 7i—1 the formula (1.5.2) states that if 2> = \a^j\n^ 

2)* = then the cofactor of in 2)* is equal to 

(iv) For i = 2 (1.5.2) implies that if 2) = 0, then every 2-rowed 
minor of 2)* vanishes. 

To prove (1.5.2) we first consider the special case when M is 
situated in the top left-hand corner of 2), so that 


M = 

• 


. M = 


• 


• ®A!fc 


®n,fc+l • • • ®nn 


Jf* = 



t Alternative proofs which do not depend on Theorem 1.6.1 will be found in 
§ 1.6.3 and § 3.6. 





26 DETERMINANTS I, §1-6 

Multiplying determinants rows by rows and using Theorem 1.4.3 
(p. 20), we obtain 



Now, by Laplace’s expansion theorem, the second determinant on 
the left is equal to M*, while the determinant on the right is equal 
to Thus ^ 

Since this is a polynomial identity in the elements of D, and 
since D does not vanish identically, it follows by Theorem 1.5.1 
that (1.6.2) is valid for the special case under consideration. 

We next turn to the general case, and suppose the minor M to 
consist of those elements of D which belong to the rows with 
suffixes ri,...,r^ and to the columns with suffixes (where 

rj < ... < and 8^ < ... < 5^^). We write 

Our aim is to reduce the general case to the special case considered 
above by rearranging the rows and columns of D in such a way that 
the minor M is moved to the top left-hand comer, while the relative 
order of the rows and columns notinvolvedin remains imchanged. 
We denote the new determinant thus obtained by 2, the A:-rowed 
minor in its top left-hand corner by the cofactor oi JK in 2i by 

and the i-rowed minor in the top left-hand corner of the adju-« 
gate determinant by JK*, In view of the special case already 
discussed we then have 

(1.6.3) 

Now obviously = Jf; and, by Exercise 1.2.2 (p. 9), 

(~l)/2>. 


(1.6.4) 













1,51.5 JACOBI'S THEOREM 27 

It is, moreover, clear that 

= {—VfM. ( 1 . 6 . 6 ) 

In view of Exercise 1.4.1 (p. 14) it follows easily that the cofactor 
of aijXa.S> is oqual to Hence 

^* = (_l)»if*, (1.6.6) 

and we complete the proof of the theorem by substituting 
(1.6.4), (1.6.6), and (1.6.6) in (1.6.3). 


Exebcise 1.6.1. Let H, G,.., be the cofactors of the elements a, h, g,„. 
in the determinant 


A = 


a h g 
h b f 
g f c 


Showthata.44-/^^f+^^^ ^ A^aH-i-hB-i-gF = 0, and also that the cofactors 
of the elements in the determinant 


A H G 
H ^ B F 
G F C 

are equal to aA, ^A, ^A,.., respectively. 


1.6. Two special theorems on linear equations 

We shall next prove two special theorems on linear equations 
and derive some of their consequences. The second theorem is 
needed for establishing the basis theorems (Theorems 2.3.2 and 
2.3.3) in the next chapter. In touching on the subject of linear 
equations we do not at present seek to develop a general theory— 
task which we defer till Chapter V. 

1.6.1. Theorem 1.6.1. Let ^ > 1, and let D = be a given 
determinant. Then a necessary and sufficient condition for the existence 
of numbers satisfying the equations 




i) = 0. 


( 1 . 6 . 1 ) 

( 1 . 6 . 2 ) 


The sufficiency of the stated condition is established by induction 
with respect ton. Forn= 1 the assertion is true trivially. Suppose 
it holds for n— 1, where n ^ 2; we shall then show that it also holds 


t It must, of course, be remembered that a^j does not necessarily stand in the 
ith row and Jth column of 




28 DETERMINANTS I, § 1.6 

for n. Let (1.6.2) be satisfied. If = ... = a^i = 0, then (1.6.1) 
is satisfied by 

— i, I2 — ... — — u, 

and the required assertion is seen to hold. If, on the other hand, 
the numbers do i)ot all vanish we may assume, without 

loss of generality, that 0. In that case we subtract, for 

% = 2,...,n, aa /%1 the first row from the ith row in 2) and 
obtain 



«n 

®12 

• 

. 

«ln 


^22 

• 

• 

to 

D = 

0 

^22 

• 

• 


= aii 

^n2 

• 

• 

^nn 


0 

^w2 


. 

^nn 







where {i,j = 2,...,n). 

ail 

^22 • • • ^ 2 n 

Hence .== 0, 

^n2 • • • ^nn 

and so, by the induction hypothesis, there exist numbers ^ 2 >***» 
not all zero, such that 


=0 {i = 2,...,n). (1.6.3) 

«u / 

Let ^1 now be defined by the equation 

< 1 =-—(1.6.4) 

so that ^ a^jtj = 0. (1.6.5) 

By (1.6.3) and (1.6.4) we have 

— 6 (® — 2,...,71'), (1.6.6) 

and (1.6.6) and (1.6.6) are together equivalent to (1.6.1). The 
sufficiency of (1.6.2) is therefore established. ” ' 

To prove the necessity of (1.6.2) we again argue by induction. 

We have to show that if i) ^ 0 and the numbers .satisfy 

(1.6.1), then = ... = t^ = Q. For n = \ this assertion is true 
trivially. Suppose, next, that it holds for n—1, where n'^2. 
The numbers a„i are not all zero (since Z),^^ 0), and we may. 





I, §1.6 TWO SPECIAL THEOREMS ON LINEAR EQUATIONS 29 

therefore, assume that ^ 0. If satisfy (1.6.1), then 

(1.6.4) holds and therefore so does (1.6.3). But 



Hence, by (1.6.3) and the induction hypothesis, == ••• = 

It follows, by (1.6.4), that = 0; and the proof is therefore 
complete.! 

An alternative proof of the necessity of condition (1.6.2) can be based on 
Theorem 1.4.3 (p. 20). Suppose that there exist numbers not all 

zero, satisfying (1.6.1), i.o. 

= 0 (i = 1.n). 

Denoting by the cofactor of in D wo therefore have 


n n 




(k = 

i.e. 

2 2 ^ij^ik = ^ 

(k = 1.n). 


Hence, by Theorem 1.4.3, 

= 0 (k ~ l,...w) 

i.e. tj^D — 0 (k — l,...,n). But, by hypothesis, are not all equal to 

zero; and therefore D = 0. 

An obvious but useful consequence of Theorem 1.6.1 is as follows: 

• 

Theorem 1.6.2. Let (i = 1,..., n—1 ; j = 1,..., n) be given 
numbers, where n ^ 2. Then there exists at least one set of numbers 
ti,...,tn, not all zero, such that 


# 




(1.6.7) 


To the 71—1 equations comprising (1.6.7) we add the equation 


0 ^ 1 +...+ 0 ^^ = 0 , 

t The reader should note that the proof just given depends essentially on the 
elementary device of reducing the number of ‘unknowns’ from n to n—1 by 
elimination of 




DETERMINANTS 


30 


I, §1.6 


which does not, of course, affect the choice of permissible sets of 
the numbers Since 



• 



• 


0 

. 

0 


it follows by the previous theorem that there exist values of 
not all zero, which satisfy (1.6.7). 

It is interesting to observe that we can easily give a direct proof 
of Theorem 1.6.2, without appealing to the theory of determinants, 
by using essentially the same argument as in the proof of Theorem 
1.6.1. For = 2 the assertion is obviously true. Assume that it 
holds for 71—1, where ti ^ 3. If now = ... = i = 0, then 
the equations (1.6.7) are satisfied by = 1, ^2 = ... = = 0. If, 

however,do not all vanish, we mayassumethataii ^ 0. 
In that case we consider the equations 


622^2+...+ ^ 2n^n ~ ^ I 


^n-l,2^2+ —== 

where the b^j are defined as in the proof of Theorem 1.6.1. By the 
induction hypothesis there exist values of satis¬ 

fying the last 71—2 equations in (1.6.8); and, with a suitable choice 
of f 1 , the first equation can be satisfied, too. But the values 
which satisfy (1.6.8) also satisfy (1.6.7), and the theorem is therefore 
proved. 

Exercise 1.6.1. Let 1 < m < n and let (i == l,...,m; jf == l,...,n) bo 
given numbers. Show that there exist numbers ^ 1 ,... ^n» ^^t all 0, such that 


+ ®mn^n 

1.6.2. As a first application of Theorem 1.6.1 we shall prove a 
well-known result on polynomials, which wiQ be useful in la^er • 
chapters. 

Theorem 1.6.3. If the polynomial 

f(x) = Coa:’‘+Cia:»-i+...+c„_ia:+c„ 
vanishes for n+l distinct values of x, then it vanishes identically. 





I, §1.6 TWO SPECIAL THEOREMS ON LINEAR EQUATIONS 31 
Let be distinct numbers, and suppose that 

f(xi) = ... =/(»„+!) = 0, i.e. 

Co®i+Ci«i“^+-+c„_ia;i+c„ = 0 


<'0*n+l+®l®n+i+ —+Cn-l®n+l+®n — 

Since, by (1.4.6), p. 17, the Vandermonde determinant 


/mTI /mTI — 1 1 

± 

/ytl /ytl 1 /M I 

•^n+1 “^n+l • • • •*'n+l ^ 

is equal to 17 

and therefore not equal to zero, it follows hy Theorem 1.6.1 that 
Cq = Cl == ... = = 0, i.e. that f(x) vanishes identically. 


Corollary. If f(x), g{x) are polynomials^ and there exists a 

f(x) ^ g(x) 


constant Xq such that 


whenever x > Xq, then the equality holds for all values of x. 

Let n be the greater of the degrees of / and g, Now/( 2 c)— gr(a;) 
vanishes for any n +1 distinct values of x which exceed x^y and the 
assertion follows, therefore, by Theorem 1.6.3. 

1.6.3. Theorem 1.6.1 enables us to dispense with the comparatively deep 
Theorem 1.6.1 (p. 24) in the proof of Theorem 1.5.2. As we recall, there is 
only a difficulty when D = 0, and in that case wo have to show that JD* — 0. 
We write, as before, D = loij|n» -D* = assume (as we may clearly 

• do) that at least one element in D, say aj^j, does not vanish. In view of 
Theorem 1.4.3 (p. 20) and the assumption D — 0 we infer that the relations 

2 Aijtj = 0 (^ = l,...,n) 

j^i 

are satisfied for = a;ti» •••> 

But here ti ^ aj^i^ 0 and so, by Theorem 1.6.1, 


An 

• 

• 

• 

A\n 

^ni 

. 

. 

. 

Ann 


1.6.4. It is useful to possess some easily applicable criteria for 
deciding whether a determinant does or does not vanish. Below 
we shall deduce one such criterion due to Minkowski (1900). 






32 DETERMINANTS I, § 1-6 

Demnition 1.6.1. A determinant |Of^|„ is dominated by its 
diagonal elements if 

l«rrl > i l«ral ('>' == 1,•••,»)• (1-6.9) 

8¥^r 

Theorem 1,6.4. A determinant which is dominated hy its diagonal 
elements does not vanish. 

Let D = be the determinant in question and suppose that 
D = 0. Then, by Theorem 1.6.1, there exist numbers 
all 0, such that ^ 

= 6 (»■ = l,-,«)- (1.6.10) 

We have, for some k, 

max 141 = 141 # 0. 

n 

Then, by (1.6.10), = — 2 

S=1 

8^k 

and so |a*fc| |4| < 2 |a*J |4| < 2 l«fal 141- 

S=*l 

8^k 8¥=k 

Therefore 10*^1 < 2 

si^k 

and this contradicts the hypothesis (1.6.9). 

Exebcise 1.6.2. Show that if even a single one of the n signs ‘ > ’ in 
(1.6.9) is replaced by ‘ then the conclusion of Theorem 1.6.4 need no 
longer be true. 

A result related to Theorem 1.6.4 is as follows. 

Theorem 1.6.5. If D is a determinant, with real dements and 
positive diagonal elements, which is dominated by its diagonal elements, 
then its value is positive. 

We give a proof due to Eurtwangler (1936). The theorem is 
obviously true for = 2. Assume that it holds for n—1, where 
71 > 3. We know that ^ 0; and subtracting, for r = 2,..,,n, 
times the first row from the rth row, we obtain D = a^D^ 
where , » 

O 22 • • • 

D' =. 

bn2 • • • ^nn 




and 


(r,s — ^)* 



I, § 1.6 TWO SPECIAL THEOREMS ON LINEAR EQUATIONS 33 

n 

Now, by hypothesis, 2 l®i»l < l®ul- 

«=2 


Therefore 


n 

I 

S = 2 






< Ki\ (r = 2,...,7 i), 


and hence, for r = 2,...,n, 
'Hi 


'■^+ i i KI+1 

^11 s—2 011 a=2 


8^r 


a=2 

a^tr 


a==2 

a^^r 


«^rl«la 


a 


11 




n w 

2 K.I+ y 

1 = 2 


a 


®rl ^la 


a 


11 


Thus 


a = 2 

brr > 1 \b, 


a-2 

a^^r 


< 2 Kl + lanl = 2 Ksl <®n- 
a=2 a=>=l 

a#r ai^r 

I (r=2,...,n). 


Hence, by the induction hypothesis, Z)' > 0, and so D > 0. The 
proof is therefore complete. 

It is possible to obtain further refinements of Theorems 1.6.4 
and 1.6.5, and we conclude this chapter by stating two of these 
without proof, (i) If D = \a^j\n is a determinant satisfying the 
conditions of Theorem 1.6.4, then 

n (!«„!-«?) < ii>i < n (ia,ri+«?), 

r=l r=l 

where - fW d < r < ») 

V 0 (r = n), 

(ii) If D = is a determinant satisfying the conditions of 
* Theorem 1.B.5, then 

n (®rr-a?) < -D < n («rr+«?). 


r«l 


r-1 


where a* is defined as before.t 


PROBLEMS ON CHAPTER I 
1. Evaluate the determinant 


ipl 

Vi 

a 

b 

20^2 

^2 

Vz 

c 

63 J 3 

Sajg 

^2 

2/8 

240^4 

12a;4 




t I am indebted to Professor A. Oppenheim for these estimates. An independent 
proof has been given by G. B. Price, Proc. Amer. Math, Sac, 2 (1951), 497-502. 


6682 



34 


DETERMINANTS 


I 


2 . Express the determinant 

1 a*— 6 c o* 

1 6 ®—ca 6 * 

1 c®—a 6 

as the product/^f one quadratic and four linear factors. 

3. By sqqairing the determinant 

6 0 0 
a 0 c , 

0 a h 

6 ®+c® ah ca 

show that ah c®-f a® 6 c = 4:a^b^c^. 

ca be a®-l- 6 ® 

4. Use Laplace’s expansion to show that 

a —6 -—a 6 

6 a —6 —a 
c —d c —d 
d c d c 

6 . Find the discriminant of the equation a;*+aa;-f 6 = 0 . Hence show 
that the equation a;*4-4a;-f 3 = 0 has at least two coincident roots. 

6 . IfSr = and 

^n+X ^n +2 
~ ^n +1 ^n +2 ^n+3 

®n+a ®n+3 ^n+4 

show that, for any positive integer n, 

An = (a^y)"{0—y)(y—“)(«— 

Show that, when a, yS, y are the roots of the equation — 0, then 

A„ = (-l)»+ic«(46’+27c*). 

7. Show that |oy|n = where 

8 . If 5 — ai+-'+On» == !»...>n), prove that 

a;—tta Og . . . a„ 

0 ?—.4a 03 ... 


a^ a^ 03 . . . x—A 

9. Denoting by A„ the n-rowed determinant 

l-f-x® a; 0 0 

X l+i*?* X 0 

0 X l+a?® X 

0 0 X 1+aj* 


prove that A„—= »*(A^_i—A„_a)» hence evaluate A„. 



= X{X—8)^'’^, 
n 



= 4(oH6*)(c»+d®). 







I 


PROBLEMS ON CHAPTER I 


35 


10. A determinant D is of order n, and a determinant E is obtained from 
it by subtracting from each element of D the sum of the other elements 
in the same column. Prove that E = — (n— 

11. is the n-rowed determinant in which the elements in the diagonal 

are a, those immediately ar '^ve and immediately below the diagonal are b 
and c respectively, and all the remaining elements are 0. Obtain a relation 
between Z>„, and 1)^-2 5 evaluate when a = l-l-6c. 

12. Let be the n-rowed determinant 


X 

a 

0 

0 . 



a 

X 

a 

0 . 



0 

a 

X 

a . 



0 

0 

a 

X . 




Show how to find the value of D^; and prove that, if a = I, x — 2cosd 
(0 < ^ < 7r),thcnD„ = sin(n-j-1 )0/sin 

13. Prove that 


1 -f Ui 

aa 

U 3 





1 -f ^*2 

O 3 


• On 


ai 

02 

1+^*3 


. . 

= l+a^-f 02+a3-{-...+an, 


02 

O 3 


l+On 



and deduce that 


X 

Oi 


• 

. Oi 


(^2 

X 



Ua 


O 3 

03 

X 


• 03 


o« 

On 

On 

. 

X 



14. Provo the identity 



, + n / i,j = l 

15. Evaluate the determinant laij|n» where a^j ~ \xi—Xj\ 
are given real numbers. 

Evaluate the determinant 


0 

1 

2 

3 

. n ~ 1 

1 

0 

1 

2 

n—2 

2 

1 

0 

1 

. • TV — 3 

n—1 

n—2 

n—3 

n—4 

. . 0 


16. Show that 


Oil • 

• Oin 

Pi 

Oil 

• Oin ^ 

, 


, 

= r . . 

. . . - T AnPiq,. 

Onl 

• • Onn 

Pn 

Onl 

II 

7 

Ql • 

■ • 9n 

r 




where is the cofactor of a^j in |oy,|n* 















36 


DETERMINANTS 


I 


17. Let denote the cofactor of in the determinant A = which 
is such that a^j — aj^ {i,j = Show that 



• • ^in 

% 

*^nl 




• • Wn 

0 




Deduce, (i) by using Theorem 1.4.3 (p. 20), (ii) by using Theorem 1.6.2 
(p. 24), that, for k = l,...,n. 


All 

' • 

• -^in 


Ant ■ 

, 

• -^nn 

^nk 

^k\ 

. 

• Ajin 

0 




18. Prove that, if i^(^) = where (i^j — l,...,n), then 


m n = F(^) n {«+^r). 

f-l r-l 


Deduce that 


1 

Xi , 



. . xj 


1 a?! . . 

. a:J"* 

1 

• « 


a:»+i . 

• • *!! 

= «Jfe 

1 . 



where is the coefficient of in XJ 

f-l 

19. Let a„(a;) (r,« = l,,..,n) be differentiable functions of x and denote 
by D(x) the determinant 


aii(x) 

• Oin(«) 


• • a„„(x) 


Show that D\x) — Di(a;)-f-...+D„(a;), where DjJ,x) denotes the determinant 
obtained from D(x) by differentiating the elements of the A;th column only. 
20, Show that the determinant * * 





• • fnM 


wherear© polynomials and a?i,are variables, is divisible by 

n (Xi-Xj). 


21. Show that the value of the ‘circulant’ 


ai 

€1^2 * * 

• ®»-i 

On 


ai . . 



a. 

• • 

, a„ 

«1 


is equal to 


Jb-l 










I PROBLEMS ON CHAPTER I 

where O'^e the nth roots of unity. Deduce that 


37 


= (x-\-y-{-z)(x-\-yp-\-zp^)(x-^yp^+zp), 

where p = 

22. Let Ittijln he a doterminaiit whose value is zero. Show that, if 
denotes the cofactor of then 

d s). 

If no cofactor vanishes, show that the cofactors of elements in any one row 
(column) are proportional to the cofactors of elements in any other row 
(column). 

23. The determinant \ctij\n is said to be shew-symmetric if a^j — — Oji 
(^,y — l,..,,n). Show that every skew-symmetric determinant of odd order 
is equal to zero, and that every skew-symmetric determinant of even order is 
equal to the square of a polynomial in the elements of the determinant. 

24. Let Ci,C 2 ,...,c„ be given integers whoso highest common factor is 1. 
Show (say by induction with respect to |ci|4-|c 2 |-|-...+|c,i|) that there 
exists an n-rowed determinant with integral elements whose first row is 
Ci»C 2 ,...,c„ and whose value is 1. 

25. Let D = \o>rs\n he a determinant with complex elements, and write 

= 2 l«r.l- 
a=l 
8^r 

(i) Show that, if jaul ^ Ai ^ 0 and > Ar (r — 2,...,n), thon'D 0. 

(ii) Show that, if \a^\, \agg\ > AfAg {r,s = l,...,n; r ^ s), then Z> # 0. 

26. Let n > 2 and let a^g {r,s = l,...,n) and 6^ (r — l,...,n~l) be given 
numbers. S\ippose that, for every value of r in the range 1 < r < n. 


K 


^r,r+a 



^r+l,r 

Uf+l.r+l 

®r+l,r+2 

• ®r+l.n 

= 0. 

^nr 

^n.r+l 

®n,r+a 

• ^nn 



Prove, by induction with respect to n, that 


(®ll (®n-l,n-l ^n-l)®nn* 


27. Let D = l^iiln and denote by A^ the cofactor of a^. By considering 
^headeterminant 


Oil—Z)/.Au 

®12 

- . 

^21 

^22 

• • ®2n 

®nl 

On2 

• • ^nn 


and making use of Theorem 1.6.4, prove (by induction with respect to n) 
the two results stated at the end of § 1.6.4. 





38 


DETERMINANTS 


1 


28. Let 1 < n < JV and suppose that the rectangular array of complex 


numbers 


is such that 



Oi2 

. . ai2V' 

«21 

0,2% 

. . 






Kl > 2 K.I 

(r = 1,..., 




By using the idea of the proof of Theorem 1.6.5 (p. 32), show that the 
modulus of the determinant 


• • • ^in 

®nl • * • ®nn 

is greater than the modulus of any other n-rowed minor selected from the 
array. Deduce Theorem 1.6.4. 

29. Give a proof of Theorem 1.6.5 by considering the determinant 




Ul2 

• • «ln 

D{x) = 

®21 


Oin 


^nl 

0„2 

■ ■ OnB+® 


and making use of Theorem 1.6.4 and of properties of continuous functions. 





II 


VECTOR SPACES Ai'^D LINEAR MANIFOLDS 

We shall find it convenient to use the customary symbolism to 
denote class membership, writing x eX to indicate that an object 
a; is a member of a class X and xe X to indicate that it is not a 
member of X, We also write x,y,.,. e X to indicate that x,y,,,, all 
belong to X, If X and Y are two classes such that, whenever x eX 
then a; e y, we say that X is contained in Y, and we write X cY 
(or Y D X), The relation X cY does not, of course, preclude the 
possibility that X and Y are the same class. 

2.1. The algebra of vectors 

In this chapter we shall develop the algebra of vectors and 
examine the axiomatic foundations of the theory. Many of the 
ideas we shall meet in the course of our discussion originated in the 
work of Grassmann.f 

2.1.1. In the present section we propose to deal with the simplest 
properties of operations defined for vectors. We begin by introduc¬ 
ing the concept of a field, which plays a fundamental part through¬ 
out algebra. 

Definition 2.1.1. A field is a set of {not fewer than two) numbers 
which is closed with respect to the four rational operations of addition, 

‘ subtraction, multiplication, and division by any non-zero number\ 

This definition may be stated more fully as follows. A set g of 
numbers is a field if, whenever we have a, b e^, we also have 
a-f 6 , a—b, ab and, provided that 6 7 *^: 0 , a /6 e 5 * 

Obvious instances of fields are the set of all rational numbers, 
the set of all real numbers, and the set of all complex numbers. On 
the other hand, the set of integers is not a field since it is not closed 
witfc respect to division, and the set of positive numbers is not a 
field since it is not closed with respect to subtraction. In what 
•follows we shall normally be concerned with the set of all real 

t For bibliography and comment see Hamburger and Grimshaw, 21, 171. 

% The system we have defined should, strictly speaking, be called a number 
field since the normal usage of the word ‘field* embraces systems more general 
than that with which we are concerned. 



40 VECTOR SPACES AND LINEAR MANIFOLDS II, §2.1 

numbers (the real field) and the set of all complex numbers (the 
complex field). , 

It may be of interest to note that every field contains the field 
of rational.numbers. For, since a field 5 contains a number other 
than 0, Suppose that a e « 7^ 0. Then 0 = a—a e Also 
1 == aja 6 g, and hence, for every positive integer k we have 
k — l + l+.-.+l e (J. This, in turn, implies that —k = 0—k e g, 
and so Ijm e g, where I, m are any integers and m ^4 0. Thus every 
rational number belongs to gf- 

We next introduce vectors. In elementary mathematics a vector 
is described as an entity having both magnitude and dir ection. 
We can handle such entities analytically by representing them as 
pairs or triads of numbers. An immediate extension of this idea 
leads to the definition of vectors now to be adopted. 

Definition 2.1.2. Let g he afield. A vector x of order « > 0 
OVER {5 is an ordered set {x^, x^,..., x„) of n numbers which belong 
to ty. 

Vectors will normally be denoted by small letters in bold-face 
*yP®- If X = (a:i,a;2,...,a;„), 

then a?!, arg,..., are called the first, second,..., nth components of x. 

Definition 2.1.2 implies that two vectors (of the same order) are 
equal if and only if all their corresponding components are equal. 
Thus, if X = {Xi,X2,...,xJ and y = (yi,y2,-,ynl then x = y 
means that x^ = y^, x^ = y 2 ,—,Xn == 2/n- 

The field 5 from which the components are chosen is often called 
the reference field. Vectors over the real field are ki\own as real. 
vectors and those over the complex field as complex vectors. In 
contrast to vectors over 5, the numbers of 5 are called scalars. 

Definition 2.1.3. The zero vector of order n is the vector 
(0,0,..., 0) with n components. It is denoted by 0 . 

Throughout this chapter all numbers are supposed to belong to a 
field (Jf which is assumed to be specified once and for all. 

The introduction of vectors "only becomes interesting when. w<p 
define and investigate operations performed on them. 

Definition 2.1.4. The multiplication of a vector by a* 
SCALAR is defined by the formula 

oCX — X<x — {aXi,...,aXn), 
where X = {xi,...,x„). 



THE ALGEBRA OF VECTORS 


41 


n, § 2.1 

We note the following obvious results. 

lx = X, 

(-l)x -= (- Xi ,...,- xj , 

Ox = 0 , 
aO = 0 , 

(a^)x = a(^x). 

Definition 2.1.5. vector addition, i.e. addition of vectors of 
the same order is defined by the formula 

From the last two definitions a number of important con¬ 
sequences can be inferred immediately. Thus we have the following 
results. 

Theorem 2 .1.1. (i) Vector addition is commutative and associative, 
x+y = y+x. 
x+(y+z) = (x-t-y)-f-z.t 

(ii) Vector addition and multiplication by scalars are connect^ by 
the following distributive laws. 

a(x-fy) = ax-f-ay, 

(a-f-j3)x = aX-fjSx. 

Exercisf. 2.1.1. Write out the proof of Theorem 2.1.1. 

, The next theorem allows us to define the operation of subtraction. 

Theorem 2 .1.2. If x, y are any vectors (of the same order), then 
there exists a unique vector z satisfying the equation 

y+z = X. 

For let X = y = (2/1,..., y„). Then 

z = (a;i-yi,...,x„-y„) 

satisfies the above equation and is clearly the only vector to do so. 

Definition 2.1.6. The vector z of the previous theorem is called 
* the difference o/x and y and is denoted by x—y. Moreover, we write 
—xfor 0—X. 

t The associativity of vector addition enables us to use expressions such as 
x-fy-f z without ambiguity. 



42 VECTOR SPACES AND LINEAR MANIFOLDS n, § 2.1 

A number of results relating to subtraction now follow at once. 
Thus we have x—x = 0 

(-l)x = -X, 

-(-X) = X, 

x-y = -(y-x), 
x-(y-z) = (x-y)+z, 

«{x—y) = ax—ay, 

(a—j8)x == c»x—j8x, 

X—y = z implies x = y+z. 

2.1.2. Definition 2.1.7. An expression of the form 

ai Xj-f"...*|*a;j5 Xjj., 

where are scalars, is coiled a linear combination of the 

vectors x^,..., x^.. If a vector x is equal to some linear combination of 
Xi,..., Xj., it is said to be expressible linearly in terms of, or to depend 
linearly on, x^,..., x*.. 

It is obvious that a linear combination of vectors is again a vector. 
We shall be particularly interested in sets of vectors having the 
property that any linear combination of vectors of such a set is 
again a vector of the set. Sets of this type are introduced in the next 
definition. 

Definition 2 .1.8. A vector space of order n over g is a set 
93 of vectors of order n over 5 vnth the property that, whenever 
x,y G 95, a G 5, w’e Mve ax, x+y G 95. 

This definition states, in fact, that a vector space is a set of 
vectors closed with respect to vector addition and multiplication 
by scalars. The most obvious instance of a vector space is, of course, 
the set of all vectors of a given order. 

Definition 2.1.9. The set of all vectors of order n over g is called 
the TOTAL VECTOR SPACE OP ORDER n OVER gf, and is denoted by 

35n(<5) www'e briefly, by 95„. 

It is not difficult to give other instances of vector spaces. We* 
mention the following. 

(i) The set consisting of the zero vector only. This vector space 
is called the nuU space. 



43 


II, §2.1 THE ALGEBRA OF VECTORS 


(ii) The set of all vectors of the form 

(iii) The set of all vectors of the form 

( 2 . 1 . 1 ) 

where are fixed vectors. 

The last example is particularly important. 

Definition 2.1.10. Let x^,..., Xjj. be given vectors over g. The 
vector space 95 of all vectors of the form (2.1.1), where ol^ are 
variable scalars in g, is said to be spanned {or generated) by 
x^,..., X/^; and these vectors are called generators of 93. 

It is, for instance, obvious that 93^^ (over the real or the complex 
field) is spanned by the vectors 


01 = ( 1 , 0 ,..., 0 ), 02 = ( 0 , 1 ,..., 0 ),..., 0 ^ ( 0 , 0 , 


1 ). 

( 2 . 1 . 2 ) 


Exercise 2.1.2. Can a vector space possess more than one set of 
generators ? 

Exercise 2.1.3. If Xi,...,x^, span a vector space 33 and depends 

linearly on Xi,...,X;fc, show that Xi,...,X;j. span 33. 


2.2. Linear manifold s 

2.2.1. It frequently happens that in studying the properties of 
a mathematical system S we come to recognize that the results 
obtained do not depend essentially upon the precise definition of 
S but only upon certain formal relations holding between its 
elements. In that case it may be useful to select a suitable set R of 
such relations and then consider a System S of entities of which 
•nothing is presupposed except that they, too, satisfy all relations 
in R, The system S constructed in this way is said to be derived 
from /S by a process of abstraction. If we now investigate the 
properties of S, the conclusions at which we arrive will, of course, 
hold for 8y since aS is a special case of S. In this way we obtain 
results more general than those relating to the system 8 only; and 
we are able, moreover, to distinguish between the basic structure 
of ^ and its more superficial features. 

* Tnis procedure will be adopted in the present section and the 
following one, the principal object being to study those properties 
*of vector spaces which are concerned with generators. As we shall 
presently see, however, our investigation depends not so much 
upon the actual definitions of vectors and vector spaces as upon 
the fact that vector addition and multiplication by scalars obey 



44 VECTOR SPACES AND LINEAR MANIFOLDS H, § 2.2 

certain formal laws of algebra. We shall accordingly introduce the 
notion of a ‘linear manifold’, derived by abstraction from that of a 
vector space, and shall study the properties of this abstract system, 
returning from time to time to the particular instance of a vector 
space. 

Definition 2.2.1. Let he a field and 501 a set of elements 
X, F, Suppose that with each a e g and each X is 

associated a definite element of 501, denoted by aX (or Xcx), and that 
with each pair X, F g 501 is associated a definite element of 501, denoted 

X+F. Suppose further that the operations of 'multiplication by 
scalars^ and 'addition^ thus defined satisfy the following conditions for 
all X,Y,Z G 501 and all e'S. 

(i) X+F = F+X; 

(ii) X+(Y+Z) = (X+Y)+Z; 

(iii) the equation F+ J7 = X is soluble for U; 

(iv) (<x+p)X = ocX+pX; 

(v) oc(X+Y) = ocX+aY; 

(vi) (<xp)X = <x{pX); 

(vii) IX = X. 

Then 501 is called a linear manifold over g (with respect to 
multiplication by scalars and addition as defined). 

The terminology used in algebra is, as yet, far from standardized,, 
and the structure we describe as a linear manifold is also referred 
to in the literature as a vector space, an abstract vector space, a 
linear space, or a linear system. 

It will normally be assumed without explicit mention that a 
reference field g has been chosen and that statements about linear 
manifolds relate to that field. 

It is clear that a vector space is a particular instance of a linear 
manifold; and in this case the elements of the linear manifold afb 
vectors. There are, of course, many other types of linear manifolds. 
Consider, for example, the set of all continuous (or all differentiable)^ 
functions defined in the interval 0 < a? < 1. This set is evidently 
a linear manifold with respect to the addition of functions and their 
multiplication by scalars. 



LINEAR MANIFOLDS 


45 


n, § 2.2 

Definition 2 .2,2. Let yUlhea linear manifold and let 3K' be a set 
of elements of 9K. If 9K' is also a linear manifold with respect to the 
same operations of multiplication by scalars and addition as 501, then 
501' is called a (linear) submanifold of 501. In particular, if 501 
is a vector space, then 501' is called a (vector) subspace of 501. 

Suppose that are any given elements of the linear 

manifold 501, and consider the set 501' consisting of all linear combina¬ 
tions oiX^,,.., Xjfc, i.e. the set of all elements of 501 having the form 

where are arbitrary scalars of It is then obvious that 

501' is a submanifold of 501. By analogy with Definition 2.1.10 we 
shall say that 501' is spanned (or generated) by X^,..., and we 
shall call these elements generators of 501'. Our terminology is then 
clearly in agreement with that used in the special case of vector 
spaces. 


2.2.2. We next derive a series of results showing that the 
operations of addition and multiplication by scalars in linear mani¬ 
folds obey the same formal laws as those defined for vectors. 

It is assumed throughout that 501 is a given linear manifold over 
%, that X,Y,„. are elements of 501, and that a, j3 are scalars in g. 

Theorem 2 .2.1. There exists a unique element 0, called the zero 
element of 501, such that, for all X g 501, 

X+0 = X. (2.2.1) 

To prove the existence of such an element, let Xq be any fixed 
element in 501, and let 0 be a solution—^known to exist by condition 
(iii) in Definition 2.2.1— of the equation 

Xo+0 = Xo. 

Let X be an arbitrary element of 501, and let 7 be a solution of 
• • Xo+I" = X. 

»Then, using conditions (i) and (ii) of Definition 2.2.1, we obtain 
X+Q - (Xo+r)+0 = (r+Xo)+0 = 7+(Xo+0) 

=. r-fXo = Xo+r = X. 



46 


VECTOR SPACES AND LINEAR MANIFOLDS H, § 2.2 

Thus sot contains at least one element 0 satisfying (2.2.1) for all 
X e 9Jl. If 0' is another such element, then 

0'+0 = 0 ', 

0 + 0 ' = 0 . 

Hence, in view of condition (i), 0' = 0; and the uniqueness of 0 is 
therefore established. 

Theorem 2.2.2. If X+U = Y+U, then X = T. 

This result may be described as a cancellation law for addition. 
To prove it we denote by U' an element of satisfying 

t;+I7' = 0. 

Then X-\-U — F +17 implies 

(X+U)+U' = {Y+U)+U'. 

Hence, by (ii), 

X+(U+U') = YMU+V'), 
i.e. X+Q = F+0, 

and so, by (2.2.1), X = F, as asserted. 

One of the postulates in Definition 2.2.1 was the solubility of the 
equation Y+U = X for U. We shall now show that this equation 
has precisely one solution. 

Theorem 2.2.3. The solution V of the equation Y+U = X is 
unique. 

Let U, U' be elements of 9K such that 

Y+U = X, Y+V' = X. 

Then F+17 = F+I7', 

U+Y = U'+Y, 

and therefore, by Theorem 2.2.2, V = U'. 

Definition 2.2.3. The unique dement U satisfying the equclticM 
F+Z7 = X is denoted by X—Y. In particular, Q—X is denoted 
by-X. 

This definition implies the identity 

(X-F)+F = X. 



II, §2.2 LINEAR MANIFOLDS 47 

Theorem 2,2.4. For allX e 9Jl 

Z+(-X) = 0; X-X = @. 

Denote by X’ the unique solution of X+.X' = ©• Then 
X' = 0—X = —X, and so X-t-(—X) = 0. Again, denote by X" 
the unique solution of X+X" = X. Then X" = X—X, But, in 
view of (2.2.1), X" == 0 is also a solution of the equation X+X" = X. 
Hence X—X = 0 . 

Theorem 2 .2.6, For all X,Y e "SSI 

X-Y = X+(-r). 

By condition (i) of Definition 2 . 2.1 and Theorem 2.2.4 we have 

(-r)+r = 0. 

Hence X+{(-y)+r} = X+ 0 , 

and so, by condition (ii) and ( 2 . 2 . 1 ), 

{x+(-r)}+F = X. 

Thus the element U = X+(—F) satisfies the equation F+17 = X. 
But the unique solution of this equation is 17 = X—F, and the 
theorem therefore follows. 

Theorem 2.2.6. For allX e and all ot e % 

OX = 0, 

a0 = 0. 

Using conditions (iv) and (v) of Definition 2 . 2.1 we have 
, ^ aX+OX = (a+0)X = aX == aX+0, 

aX+a0 = a(X+0) = aX = aX+0, 
and the assertion now follows at once by Theorem 2 . 2 . 2 . 

Theorem 2.2.7. For allX g SOI 

(- 1 )X = -X; -(-X) = X. 

. Jlling Theorem 2 . 2.6 and conditions (iv) and (vii) we have 
0 = OX = {l+(-l)}X = lX+(-l)X = X+(-l)X. 
«ence (-l)-X: = 0-X = -X. 

This, together with conditions (vi) and (vii), implies 

-(-X) = (- 1 ){(_ 1 )X} = {(_ 1 )(_ 1 )}X = IX = X. 



48 VECTOR SPACES AND LINEAR MANIFOLDS n,§2.2 

Exercise 2.2.1. Show that, for all .X* e SO'l and all a e 

(~a)X = ~aX. 

Exercise 2.2.2. Show that, for all X, F e 9W and all a, jS e g, 

(x{X-Y) = (xX-ocY, 

{oc-P)X == aX-j8X. 

EfxERCiSE 2.2.3. Show that, if oc ^ 0, X ^ Q, then oX ^ 0. 

/' Exercise 2.2.4. Let Xi,...,Xk generators of a linear manifold SDl, and 
let Fi,..., Yi be any elements of SW. Show that Xi,..., X^, Fj,..., Fj are genera¬ 
tors of 

Exercise 2.2.6. If a linear manifold SOI is spanned by the elements 
Xi,..., Xy, Fi,..., Yg, and if each F is a linear combination of the X’s, show that 
501 is spanned by Xi,...,X,.. 

2.3. Linear dependence and bases 

2.3.1. Let 9K again denote a linear manifold over a reference 
field 3f. Since every vector space is a linear manifold, the definitions 
and results developed below apply (with appropriate verbal 
changes) to the special case of vector spaces. 

Definition 2.3.1. The elements of 501 are linearly 

DEPENDENT if there exist scalars oc^y not all zerOy such that 

aiZi+...+afcZ;, = 0. (2.3.1) 

In the contrary casCy i.e. when (2.3.1) implies = ... = = 0, the 

elements Zi,..., Z* are linearly independent. 

It should be observed that linear dependence or independence is 
a property of the unordered set of elements Z^,..., Z^. An analogous 
remark applies to a number of statements below. 

For vectors over Definition 2.3.1 states that.Xi,...,X;i. are. 
linearly dependent or linearly independent according as there 
exist or do not exist scalars not all zero, such that 

a^x^+...+oLj,x^ = 0 . 

The concept of linear dependence will be seen to be of crucial 
importance for almost all topics treated in this book.t 

Exercise 2.3.1. Using Exercise 2.2.3 interpret the terms ‘linear depen¬ 
dence’ and ‘linear independence’ for the case of a set consisting of a ftir^le 
element only. 

Exercise 2.3.2. Show that a set of linearly independent elements of 
linear manifold cannot contain the zero element. 

t For geometrical applications of the concept of linear dependence see Bdcher, 
11, Chapter iii. 



49 


II, §2.3 LINEAB DEPENDENCE AND BASES 

Exebcise 2.3.3. (i) Show that if linearly independent, 

then so are Xi,..., Xj^, (ii) Show that if X^,..., Xj^ are linearly dependent and 
Pi,..., are any elements, then Xi,...,Xjfc, are linearly dependent. 

Definition 2.3.2. A basis of a linear manifold 9Jl is a finite 
ordered set of elements e 501 having the property that every 

X e 501 expressible as a unique linear cornbination 

Thus the elements Xi,..., Xj^ constitute a basis if every X e 501 is 
expressible in the form 

X = aiXi-|-...+ajfcX;j., 

where oci,..., are uniquely determined.J 

If a basis exists in 501, then it furnishes 501 with a ‘coordinate 
system’, since we may regard the scalars olj^ 8 pecif 3 dng X as 
the ‘coordinates’ of X. This idea will be discussed at greater length 
in § 2.4. 

We shall frequently denote a basis by a single symbol, say 5B, 
and shall indicate the fact that it consists of the elements Xj^,..., Xj^ 
(in that order) by writing 

It is, of course, obvious that if 58 is a basis of a linear manifold 501, 
then so is any (ordered) set 58' obtained by rearranging in any way 
the elements in 58. This fact suggests that it might be more con¬ 
venient to regard a basis as an unordered set of elements. However, 
the order of elements in a basis is relevant in any discussion of 
‘representations’,§ and we prefer, therefore, to retain Definition 
2.3.2 as it stands. 

. The total yector space 58^ has a basis consisting of the vectors 
ei,...,e^ defined by (2.1.2) (p. 43); for the vector (a;i,...,a;^) can be 
written in the form 

= Xie^+..,+x^e^y 

and this representation is clearly unique. We shall always denote 
the basis {ei,..., e^} of 58^ by the symbol g. 

Whereas the vector space 58^ possesses a basis, the linear mani- 
fold^f functions continuous in the interval 0 ^ a; < 1 has none. 
Thus we do not know in advance whether a given linear manifold 
possesses a basis, and the aim of the present section is to clarify the 

t What ia here described as a basis is more commonly known as a finite basis, 

t In other words, if X == oti Xi+...-f Xj^ and X = (x[X-^-\~,..-\-otj^Xj^y then 

§ See § 2.4 and Chapter IV. 

6682 


B 



60 VECTOR SPACES AND LINEAR MANIFOLDS II, § 2.3 

situation. In particular we shall demonstrate that every vector 
space possesses a basis, 

2.3.2. Theorem 2.3.1. Every basis of a linear manifold is a set 
of linearly independent generators^ and conversely. 

Let {Xi,..., be a basis of a linear manifold 501. Then Xi,..., 
are obviously generators of 501. Moreover, suppose scalars 
are chosen such that 

= ®* (2.3.2) 

In view of Theorems 2.2.6 (p. 47) and 2.2.1 (p. 45) we also have 

0Xj-|“,..-|-0Xj^ = 0. 

But the representation (2.3.2) of 0 in terms of X^,..., Xj^^ is unique, 
and therefore = ... == = 0, i.e. Xi,...,X;fe are linearly inde¬ 

pendent. This establishes the first part of the theorem. 

To prove the converse, let Xi,...,X;j. be a set of linearly inde¬ 
pendent generators of 501. If X g 501, then there exist scalars 

X = i3iXi+...+i8fcXfe. (2.3.3) 

Suppose that we also have 

X = y^X^+,„+yf^X^, 

In view of Theorem 2.2.4 (p. 47) and Exercise 2.2.2 (p. 48) it 
follows easily that 

0 = (p^^y^)Xi^+...+{Pjc—yk)^k- 
But Xi,...,Xjt are linearly independent and therefore 

= Yv —y Pk = Yk- 

c 

The representation (2.3.3) of X is therefore unique, and so 
{Zj,..., Zjfe} is a basis. 


Some linear manifolds—^for example that consisting of all 
functions continuous in the interval 0 ^ a: ^ 1—possess arbitrarily 
large sets of linearly independent elements. However, our main 
concern is with those linear manifolds for which that is not the case. 

Definition 2.3.3. The dimensionality of a Zifi38r 

manifold W. is the maximum value ofr for which SR contains r linearly 
independent elements.'f If no sttch maximum value exists, then 
d(9K) = 00. 

t If 5Dl is the null manifold, i.e. the linear manifold consisting of the zero 
element only, we define d(501) as 0. 



LINEAR DEPENDENCE AND BASES 


51 


II,§ 2.3 

The statement (Z(9K) = r (where r > 0) means, then, that 501 
contains at least one set of r linearly independent elements, but no 
set of r+1 (and therefore no set of more than r+1) linearly inde¬ 
pendent elements. In that case we say that 501 is r-dimensional. 

A linear manifold 501 having finite dimensionality is said to be 
finite-dimensional. In contrast, if d(50l) = cx) (i.e. if 501 contains 
arbitrarily large sets of linearly independent elements), then 501 
is said to be infinite-dimensional. 

Exercise 2.3.4. Show that = 1 ^^(332) “ 2. 

Exercise 2.3.5. Let the elements of a linear manifold 501 span 

a submanifold 501'. Show that d(50l') is equal to the maximum number of 
linearly independent elements among 

We are now able to establish the connexion between the dimen¬ 
sionality of a linear manifold and the nature of its bases by showing 
that if a linear manifold has a basis consisting of r elements, then 
the dimensionality of that manifold is r, and conversely. 

Theorem 2*3.2. (Basis theorem for linear manifolds) 

Let 5016c a linear manifold. 

(i) If 501 has a basis of r elements, then d(9K) = r and every basis 
of 501 consists of r {linearly independent) elements. 

(ii) If (i(50l) == r, then 501 has a basis of r elements and, indeed, 
every set of r linearly independent elements is a basis of 501. 


It should be noted that this theorem asserts, in particular, the 
existence of at least one basis in every finite-dimensional linear 
manifold. 

Let {E^,...,'Ej) be a basis of 501 and let be any r+1 

elements. Then there exist scalars (i = l,...,r; j — l,...,r+l) 
such that r 

= {j = l,...,r+l). 

1 


Now, by Theorem 1.6.2 (p. 29), we know that there exist scalars 


tl,'..., tf^l. 

not all 0, such that 


— 

r+1 

= 0 

(i = 

Hence 

• 

j—1 

r+1 r+1 r 

r r+1 





82 VECTOR SPACES AND LINEAR MANIFOLDS n, $ 2.3 

and so are linearly dependent. In view of Exercise 

2.3.3 (ii) (p. 49) it follows that any 3 elements in SR are linearly 
dependent if « > r. On the other hand, there exists a set of r linearly 
independent elements, namely Hence (i(a)l) = r. More¬ 

over, if SR has another basis of r' elements, then, by what has just 
been proved, d(SR) = r' and so r' = r. The proof of (i) is therefore 
complete. 

To establish (ii) suppose that d{SR) = r and denote by Ei,...,Ef 
any set of r linearly independent elements in SR. If X e SR, then, 
by the definition of r, the r-f 1 elements X, E-^,...,Ef are linearly 
dependent, i.e. there exist scalars not all 0, such 

<xX-\-otiEi-{- ...•^oif.Ef = 0. (2.3.4) 

Now if a were 0, then (2.3.4) would imply the linear dependence of 
Ex,...,E/, hence « # 0, and 

Thus E^,...,Ef are generators of SR, and since they are linearly 
independent they constitute, by Theorem 2.3.1, a basis of SR. This 
completes the proof. 

Theorem 2.3.2 applies, in particular, to vector spaces; but in 
that special case we can obtain a stronger result, since the finiteness 
of dimensionality now follows as a consequence of definitions and 
need not be taken as a separate postulate. 

Theorem 2.3.3. (Basis theorem for vector spaces) 

Xcf SB be a non-null vector space of order n. Then, 93 has finite 
dimensionality r > 0, and (i) r ^n; (ii) any r linearly independent 
vectors of 95 constitvle a basis; (iii) every basis of 93 consists of r 
{linearly independent) vectors. 

In view of Theorem 2.3.2 it is sufficient to prove that the dimen¬ 
sionality r of 93 is finite and satisfies the inequality r < «. We have, 
in fact, to show that any 1 vectors in 95 , say Xj,..., x„+i, are linearly 
dependent. Write ^ 

= {xu,:;Xni) (i = 1 . 

We know by Theorem 1.6.2 (p. 29) that there exist scalars t„+i, 

not all 0, such that 

n+1 

= o (*= i,...,«). 



LINEAR DEPENDENCE AND BASES 


63 


n, S 2.3 

Now these relations are equivalent to the single vector equation 

n+1 

X h^i = 0 , 

and the proof is therefore complete.f 

Exeboise 2.3.6. Write out a proof of Theorem 2.3.3 without introducing 
the notion of a linear manifold or quoting the result of Theorem 2.3.2. 

Exercise 2.3.7. Show that the vectors (a, 6), (c, d) constitute a basis 
of 932 ^ only i£ ad ^ he, 

Corollaey 1 . The dimensionality of the total vector space 93^ is 
n, and a basis of 93^ is simply a set of n linearly independent vectors 
of order n. 

Since 93^ possesses a basis (£ consisting of the n vectors e^, 
it follows that d(93^) = n. The remainder of the assertion is then 
an immediate consequence of Theorem 2.3.3, (ii) and (iii). 

Corollary 2. If a vector space is of order n and has dimensionality 
n, then it is the total vector space 93^. 

In view of Theorem 2.3.3 (iii), any basis of the given vector 
space 93 consists of n. linearly independent vectors. But, by 
Corollary 1, these n vectors span 93^. Hence 93^ c 93, and since, 
trivially, 93 c 93^, we obtain 95 = 95^. 

Theorem 2.3.4. If 93 , 93' are two vector spaces of dimensionalities 
r, r' respectively and 93 c 93', then r < r'. Moreover, if r = r', then 
93 = 93'. 

The vector space 93 contains a set of r linearly independent 
vectors, and these also belong to 93'. Now 93' contains no set of 
r'+l linearly independent vectors, and so r < r'+l, i.e. r r'. 

If 93 c 93' and r = r\ then 93 and 95' are both spanned by the 
same set of r vectors, and so are identical. 

'2.3.3. We know by Theorem 2.3.2 that every finite-dimensional 
linear manifold possesses a basis. We shall now prove that a basis 
can be constructed if we start from an arbitrary set of linearly 
independent elements in the linear manifold. 

t For fiui alternative argument establishing the existence of bases in vector 
spaces and the invariance of the number of basis elements see MacDuflee, 23, 
205-7, or Lichnerowicz, 17, 9-12. The former treatment cdso includes a proof of 
Steinitz*8 * replacement theorem’. 



64 VECTOR SPACES AND LINEAR MANIFOLDS II, § 2.3 

Theorem 2.3.5. (Completion of basis) 

Any set of linearly independent elements in a finite-dimensional 
linear manifold 501 is part of a basis of 501. 

Let d(5Pl) = r and let Xj,..., be a set of linearly independent 
elements in 501, so that < r. If ik = r, then, by Theorem 2.3.2 (ii), 
Xi,..., Xj^ constitute a basis of 501, and the conclusion of the theorem 
therefore holds. If, on the other hand, A; < r, then let us denote 
by 50 the set of all linear combinations of Xj^, Now, in view 
of Theorem 2.3.2 (i), X^,.,.,Xj^ cannot generate 501 and so there 
exists some element Xj^^^ e 501 such that e 50. Now suppose 

Then = 0, for otherwise we should have ^ 51. Hence 

OL-^X^-\- ..,'\-OLkXk — 0 , 

and, since Xi,...,Xfc are linearly independent, it follows that 
oL^ = ... = = 0. The elements Xi,...,Xj^., X;j.+i therefore 

linearly independent. 

We have thus shown that if fc < r and Xi,,.,,Xk are linearly 
independent elements in 501, then a further element X^^.^ g 501 can 
be found such that Xi,..., X^, are linearly independent. Repeat¬ 

ing this process as often as is necessary, we ultimately obtain a 
set of r linearly independent elements X^, Xk+i,-*.y X,., By 

Theorem 2.3.2 (ii) these elements constitute a basis of 501, and the 
assertion is therefore proved. 

Corollary. Any set of linearly independent vectors in a vector 
space 5B is part of a basis o/ 58. 

2.3.4. Weshall conclude the present section witha brief discussion 
oicomplements. It will be understood that U, U' denote throughout 
subspaces of 58^. 

Definition 2.3,4. The vector spaces U and U' are complements 
{of each other) if every x g 58^ has a unique representation of the form 

X = u+u' (u G U, u' G U'). {ZA 5) 

We first require a preliminary result. 

Theorem 2.3.6. Suppose that U, U' are svbspaces of 58^ whoie 
only common vector is 0. If Yi,..., y,. are linearly independent vectors 
in U and z^,..., are linearly independent vectors in U', then the r-f a 
vectors 2 Ji,...,Z 3 are linearly independent. 



LINEAR DEPENDENCE AND BASES 


55 


II, § 2.3 

Suppose that 

“iyi+ —+“ryr+^lZl+ —== 
i.e. 

Here the left-hand side is a vector in U and the right-hand side is a 
vector in U'. But, by hypothesis, 0 is the only common vector of 
U and U', and therefore 

«iyi+‘**+«ryf = + 

This implies that oc^ — ... = a,. = = ... = == 0, and the 

assertion therefore follows. 

Theorem 2.3.7. The subspaces U, U' of 93^ are complements 
if and only if 0 is the only common vector of U and U', and 

d(H)+d(U') = n. 

First, let U and U' be complements. If x belongs to both It and 
U' we may write 

X = x-f 0 (x E U, 0 G U'), 

X = 0+x (0 G U, X G U'). 

But the representation of x in the form (2.3.5) is unique, and so 
X = 0. Thus 0 is the only common vector of U and U'. 

Now write (i(U) = r, d(ll') = s. If either U or U' is the null 
space, then obviously r+5 = n. We may, therefore, assume that 
^ 1, 5 > 1. Let {yivMyr}> ^^y bases of U, U' 

respectively.’ Then, by Theorem 2.3.6, yi,..., y,., z^,..., are linearly 
independent vectors. Since, moreover, every x g 95^ is represent¬ 
able in the form (2.3.5), it follows that it is representable in the form 

X = aiyi+...+a,y,+)SiZi+...+^,Zg. (2.3.6) 

Hence yi,...,y,, Zi,...,Zg are linearly independent generators of iB„, 
and so, by Theorems 2.3.1 (p. 50) and 2.3.3 (iii) (p. 52), 
rJfr= d(95J == n. 

Next, suppose that 0 is the only common vector of U and VL', 
end that d(U)+<^(ll') = Write, as before, d(VL) = r, d(U') = s, 
and denote by {y^,..., y,}, {z^,..., z,} bases of U, U' respectively. Then, 
in view of Theorem 2.3.6, the vectors yi,...,yr, Zi,...,Zg constitute 
a basis of 95„; every x e S3„ can therefore be represented in the 



66 VECTOR SPACES AND LINEAR MANIFOLDS H, § 2.3 

form (2.3.6), and so in the form (2.3.5). The representation (2.3.6) is, 
moreover, unique; for if we have 

X = u+u', X = v+v' (u, V e U; u', v' e U'). 
then u— v = v'— u'. 

Now u•—v 6 U, v'—u' 6 U', and since 0 is the only common vector 
of U and U' we have u = v, u' = v'. The theorem is therefore 
proved. 

If S, S' are two sets of objects we shall denote by S U SB' the 
set consisting of all objects belonging to at least one of S, S'. Two 
sets of objects are said to be disjoint if they possess no common 
object. 

Theorem 2.3.8. Let S, S' be any bases of U, U' respectively. 
Then U, U' are complements if and only if S, S' are disjoint and 
S U S't is a basis of S„. 

Let d(U) = r, d(U') = s, and write 

S = {yi,-, Yr), ®' = K.-m z,}. 

If U, U' are complements, then, by Theorem 2.3.7, r+s = n and 
0 is the only vector common to U and U'; so that, in particular, 
95, are disjoint. It follows by Theorem 2.3.6 that the n vectors 
Yif.iYr, Zi,...,Z3 are linearly independent and so constitute a 
basis of 33^1. In other words, 93 U 93' is a basis of 95^. 

Assume, next, that S, S' are disjoint and that S U S' is a basis 
of Then r+5 = n. Moreover, if x 6 U, x g U\ then there 
exist scalars ai,...,a^, 

X = aiyi + --+aryr = 

Hence a:iyi+...+a,y,+(~^i)Zi+...+(~^>, == 0. 

But Yiy-->Yrf Zi,...,Z3 constitute a basis of and so are linearly 
independent. Thus 

“l == - = == 

Hence x == 0, i.e. 0 is the only vector common to U and U'. It 
follows, by Theorem 2.3.7, that U, U' are complements. The pfSof 
is therefore complete. 

Exercise 2.3.8. Let SB, be bases of U, and SB', (£' bases of U'. Show 
that if S, S' are disjoint and S u S' is a basis of Sn» then (£ u (S' is also a 
basis of Sn» 

t The vectors in S U S' may, of course, be ordered in any way. 



II. § 2.3 LINEAR DEPENDENCE AND BASES 67 

We have, so far, derived a number of properties of complements 
without having yet considered the question of their existence. 
However, the theorem just proved enables us to settle this question. 

Theorem 2.3.9. Every subspace of possesses a complement. 

Let U be a subspace of 35^ and let {Xi,...,x,.} be a basis of U. By 
the corollary to Theorem 2.3.5 there exist vectors x,..^i,...,x,j such 
that {Xj,..., X,., x,.+i,..., X J is a basis of 93^. By Theorem 2.3.8 the 
vector space U', defined as the space spanned by x,.+i,...,x^, is a 
complement of U. 

The completion of the basis {Xi,...,Xy} of U, made use of in the 
above proof, does not lead to a uniquely determined basis of 95^. 
There is, therefore, no reason to suppose that the complement of U 
is unique; and, in fact, if 0 < d{Vi) < n, it is easy to verify that it 
is not. 

Exercise 2.3.9. Extend the theory of complements to more than two 
subspaces of 93„. 


2,4. Vector representation of linear manifolds 

2.4.1. So far we have been careful to distinguish between the 
general notion of a linear manifold and the special instance of a 
vector space. We shall now see, however, that for many purposes 
this distinction need not be maintained. 

It is necessary, first of all, to introduce the concept of isomorphism. 
Roughly speaking we call two mathematical systems isomorphic if, 

, however they may differ in the nature of their elements, they 
possess the same structure. This means that it is possible to set up 
a biunique correspondence between the elements of the two systems 
in such a way that there is also an exact correspondence between 
the operations involved in the definitions of the two systems. In 
the case of linear manifolds the operations in question are, of course, 
those of addition and multiplication by scalars. 

We shall use the symbol to denote biunique correspondence, 
ms X X' (X G 3JI, X' G 9K') means that a biunique corre¬ 
spondence has been set up between SR and 9K' and that in this 
• correspondence the element X of 9K is associated with the element 
X' of 2R'. 

The definition of isomorphism for linear manifolds may now 
be framed as follows. 



68 VECTOR SPACES AND LINEAR MANIFOLDS II, §2.4 

Definition 2,4.1. Turn linear manifolds 9Jl, iDl' over a reference 
field (5 ISOMORPHIC (in symbols : 501 ci 501') if a Uunique 
correspondence X<^X' (X e3R, X' e 501') can be set up between 
SR and SR' in such a way that, whenever X o X', F F', « 6 <5, 
we have aX aX' and F -<->■ X'+ F', The correspondence itself 

is called an isomorphism between 501 and 501'. 

In otter words, 501 501' if a biunique correspondence can be 

set up between 501 and 501' such that, for all X, F e 501 and all 
a e g, (oXy = oX’ and (X+F)' = X'+F', where Z' denotes 
the (unique) element of 501' which corresponds to .Z in 501. 

It is instructive to express the same idea in yet another way by 
making use of the functional notation. An isomorphism between 
501 and 501' is a ‘function’ ^(X) defined uniquely for all X 6 501 and 
having the following properties, (i) For every X, <^(X) is an element 
of 501'. (ii) Given any element X' e SR', there exists precisely one 
element X e 501 such that ^(X) = X'. (iii) For all X, F e 501 and 
all a e we have ^(aX) = a^(X) and ^(X+F) = <f>{X)+<l>(Y). 

Two linear manifolds are, of course, said to be isomorphic if there 
exists an isomorphism between them. 

Exebcisb 2.4.1. Show that the correspondence X •<-»• aX, where a is any 
fixed non-zero scalar, is an isomorphism of 5DI with itself. 

Exercise 2.4.2. Show that the correspondence (Xj, •«-> (xj, —x,) is an 

isomorphism of 102 with itself. 

It follows at once from Definition 2.4.1 that if 501 ci 501' and 
X^ X'i (Xj 6 501, Xi e 501', i = 1. k) and if ai,..., a*, e (J, then 

aiXi-t-...+otji,Xj. ■<-> ot^Xi+ci(ji.X*. • 

Thus, if 501 and 501' are isomorphic and we form a linear combination 
of some elements in 501 and the same linear combination of the 
corresponding elements in 501', then the resulting elements in 501 and 
3R' again correspond to each other. 

Some further properties of isomorphisms between linear mani¬ 
folds may easily be inferred. 

Theorem 2.4.1. // 501 ci 501', then the zero elements in 501 and 501' 
correspond to each other. 

( 

For let X e 501, X' e 501' be any elements such that X X'. 
Then OX •<-> OX' i.e. 0 0', where 0,0' are the zero elements of 

501, SR' respectively. 



II, §2.4 VECTOR REPRESENTATION OF MANIFOLDS 59 

Theorem 2.4.2. Isomorphism preserves linear dependence and 
independence. More precisely^ i/ Sffl c::=: 951' and X\ (X^ e 951, 
XiSW, i==l,...,k) and if X^,...,Xj^ are linearly dependent 
{linearly independent), then X*y. are linearly dependent {linearly 
independent). 

Let 0, 0' denote the zero elements of 951, 951' respectively. 
Suppose that X^,..., X^^ are linearly dependent, i.e. that there exist 
scalars not all 0, such that 

0£lXi4-“*+0^A;-^fc = ©• 

Then, in view of Theorem 2.4.1, we have 

OL-^Xi-\- ...-\-OLj^X'j^ = 0 ^ 

Hence X[,...y are linearly dependent. Again, since isomorphism 
is a symmetrical relation, it follows that linear dependence of 
Xi,..., X'jc implies linear dependence of X^,..., Xj^. The proof of the 
assertion is therefore complete. 

As an immediate consequence of Theorem 2.4.2 we have the 
following result. 

Corollary. If two linear manifolds are isomorphic they have the 
same dimensionality. 

We are now in a position to show in what sense the study of linear 
manifolds can be replaced by that of vector spaces. 

Theorem 2.4.3. (Isomorphism theorem for linear manifolds) 

Every linear manifold 951 over which has finite dimensionality r, 

* is isomorphic to the total vector space 33,.(0f). 

Let == {E^,...,E^ be any basis of 951. Then, corresponding to 
each X e 951, there exist unique scalars such that 

A biunique correspondence between the elements of 951 and those 
of 95y may now be specified by the scheme 

X = x^E^+...+x^E^<^ {x^,...,x;) = X (X 6 93,.). 

It is then clear that, for every a g corresponds to ax. Again, 

• if r G 951 corresponds to y g 93,., then X+T corresponds to x+y. 
This proves the theorem. 

t The form of statement ‘if P(Q) then R(Sy means that P implies R, and 
Q implies S. 



60 


VECTOB SPACES AND LINEAR MANIFOLDS II, I 2.4 


Definition 2,4.2, If = {Ey,..., E,} is a basis of 3K and the 
element Jf e SDl has the form X — Ei+.-.+x^ E^, then the vector 

X = {x^,..., xf) is said to represent X with respect to 5B; and we 


write 


X = ^(Z; SB). 


The scalars arj,....*, may be referred to as the coordinates of X 
with respect to SB. 

We have thus arrived at a representation of 3Jl by SB,, which 
preserves the structure of 9K. This representation is not, however, 
unique since it depends on the arbitrary choice of a basis in 501. 
By varjdng this choice of basis we obtain different representations 
of 501 by SB,. 


Exercise 2.4.3. Show that if is any basis of iDl and 0 is the zero 
element, then ^ ^ 

Exercise 2.4.4. Prove that if two linear manifolds have the same 
finite dimensionality then they are isomorphic. 

For the special case of vector spaces, Theorem 2.4.3 reduces to 
the following result. 

CoBOLLARY. A vcctor space of order n and dimensionality r is 
isomorphic to the total vector space of order r. 


Exercise 2.4.5. Illustrate this corollary by considering the vector space 
consisting of all vectors of the form (0,Xa,...,x„). 

Exercise 2.4.6. Lot SB be a basis of 25,^. If x,y g explain what is 
meant by the statement ‘x represents y with respect to iB ’. Show that every 
vector in represents itself with respect to (£. 

2.4.2. A vector space which is isomorphic to a linear manifold 
is said to represent it, and the importance of such representations 
lies in the fact that they furnish us with a simple and effective 
technique for studying the properties of linear manifolds. In view 
of this we shall from now on confine our attention for the most 
part to vector spaces and return only occasionally to the more 
general idea of a linear manifold. Nevertheless, it is sometimes 
more illuminating to think in terms of linear manifolds rather 
than in terms of vector spaces representing them. For in 
n-dimensional vector space 33^ we have a favoured basis e^}, 
and any vector (a;i,...,a;^) is expressible in the simple form 


In a linear manifold, on the other hand, aU bases are of exactly the 
same standing; and while the existence of a favoured basis in a 



n,§2.4 VECTOB KEPBESENTATION OF MANIFOLDS 61 

vector space makes for convenience of calculation, it may obscure 
some of the essential features of the problem. 

Representation of a given linear manifold 9Jl by a vector space is 
possible, as we have seen, in more than one way. Indeed, it is clear 
from the proof of Theorem 2.4.3 that each choice of a basis in 9K 
leads to a different isomorphism between 9Jl and 93^. All such 
representations of 9K by 93^ are of equal status, none having 
precedence over any other. We should think of 9K as an entity 
possessing infinitely many representations in terms of 93,., but 
existing independently of them all. 

The idea of representing a linear manifold by a vector space is, 
of course, already familiar to the reader since it is virtually the 
same as that of representing points by means of cartesian co¬ 
ordinates. This representation of points is arrived at in the 
following manner. Let S be ordinary three-dimensional space. 
If an origin 0 is fixed, then with every point of S is associated 

a unique directed segment 0X,'\ By obvious geometrical construc¬ 
tions the directed segments can be added together and multiplied 
by real scalars, and it is easy to verify that, with respect to these 
operations, they constitute a linear manifold 9M. 

Every choice of coordinate axes, rectangular or oblique, through 
0 is equivalent to a choice of basis in 931. Eor let a system C of 
coordinate axes through 0 be given and let Ag, be points on 

the axes such that the segments OA^, OA^, OA^, measured posi¬ 
tively, are all of unit length. Then OA^, OA^y OA^ are linearly 
. independent elements of 931; and if the coordinates, with respect 
to C, of a point X are x^, then 

OX = x^OA^+x^OA^+x^ OAq. 


Thus OAi, OA 2 , OAq constitute a basis of 93i; and conversely, every 
choice of basis in 931 (together with a choice of scale) determines a 
unique coordinate system in S. 

now the coordinates, with respect to C, of points X, Ye S 
are (Xi,X 2 ,Xq), (yx^y^iV^) respectively, then the coordinates of 

•points corresponding to the segments aOX, OX+OY are 
aajg, aajg), (aJi+i/i, aJ2+2/2> ^*^ 3 + 3 / 3 )- Hence 931 is isomorphic to 

t The directed segments €«pe often referred to as ‘position vectors’ of points 
with respect to 0. 



62 VECTOR SPACES AND LINEAR MANIFOLDS II, § 2.4 

ffia, and each choice of axes in S (i.e. each choice of basis in SK) 
leads to a different mode of representation of 9W by © 3 . 

Situations of the above type where an entity (such as, for instance, 
a linear manifold) admits of infinitely many representations but 
exists independently of them are quite common in algebra, and we 
shall meet further instances when we come to consider linear, 
bilinear, and quadratic operators in Chapters IV and XII. 

2.5. Inner products and orthonormal bases 

2.5.1. Let a system of rectangular coordinates be introduced 
in three-dimensional space, and let (xy^yX^yX^), (2/1,2/2* 2/3) 
coordinates of two points. We may think of these triads as vectors 
X, y. The expression 

^iyi+x2y2+^zy^ 

(which is familiar from elementary geometry) is called the scalar 
product or inner product of x and y. This notion is obviously capable 
of generalization to any number of dimensions. If x = (Xy^y..-yX^ 
y = ( 2 /i>—> 2 /n) vectors we shall define their inner 

product (x, y) as the expression 

(x,y) == a:jyi+...+a:„y„. (2.5.1) 

We shall also define the length |x| of the vector x as 

|x| = +(x,x)* = +(a:?+...+4)*. (2.5.2) 

When n = 2 or n = 3, and a system of rectangular cartesian 
coordinates is given, then x can be identified (in the obvious way) 
with a position vector. In that case |xl is simply the length of the 
position vector. 

The definitions just given are no longer appropriate when we 
deal with complex vectors. We shall, therefore, introduce modified 
definitions of inner product and length for the general case of 
complex vectors; these definitions will naturally reduce to (2.5.1) 
and (2.5.2) when the vectors are real. It is to be understood that 
all vectors and scalars which occur below are complex unless the 
contrary is stated. As usual t denotes the complex conjugate (JtL 

2.5.2. Definition 2.5.1. The inner product (x, y) of two 

complex vectors x = y = ( 2 /i»—>yn) defined as 

(x,y) = »iyi+-.+*nyn- 

A number of identities follow trivially from this definition. 



n,§2.6 INNER PRODUCTS AND ORTHONORMAL BASES 63 

Theorem 2.6.1. 

(i) (x,y) = (y.x); 

(ii) a(x,y) = (5x,y) = (X,ay); 

(iii) (x+y,z) = (x,z)+{y,z), 

(x,y+z) = (x,y)+(x,z). 

Corollary. // x, y are real vectors and oc a real scalary then 
(X, y) == (y, X), 

(x(x, y) = (ax, y) = (X, ay). 

Definition 2.5.2. (i) The length (or norm) |x| of the complex 
vector X = (aj^,..., is defined as 

|x| =+(X,X)^ = +(la:i|2+...+ Kni. 

(ii) The expression |x—y | will he called the separation of x and y. 

When 71 = 2 or 3 and the vectors are real, these ideas have a 
simple geometrical interpretation. Consider the case = 3. If a 
system of rectangular coordinates is introduced in ordinary 
geometrical space S, every point P becomes associated with a 

definite vector OP = x with components (x, y, z) ; and we thus have 
a biunique correspondence between the points of S and the 
vectors of ® 3 . If P, Q are any two points and x, y are the corre¬ 
sponding vectors, the separation |x—y | of the vectors is simply the 
distance PQ, The length of the vector x, on the other hand, is 
equal to the distance of the point P from the origin 0. 

We return now to the consideration of complex vectors and note 
that |x| is always real. Moreover, jxj > 0 for all x, and |xl = 0 if 
and only if x = 0. Again it is clear that 

|ax| = |a|.|x|.t 

Exeboisb 2.6.1. Show that |x—y| = |y—x|. 

We next consider some inequalities connected with inner 
products. 

Theorem 2.6.2. |(x, y)| < |x| |y|. 

% prove this result we write X = y = (Vv—yVn)- The 

assertion then becomes 

l^ryr < (iKI®)(i IS/rl^)- 

'r»=l ' 'r=l ' 

t The reader should be careful to distinguish between the two uses of vertical 
bars; |a| denotes the modulus of the number a and |x| the length of the vector x. 



64 VECTOR SPACES AND LINEAR MANIFOLDS H, § 2.6 

This is the inequality of Cauchy and Schwarz for complex numbers. 
We may establish it by observing that 

r,«=»l r,8=l 

n 

= 2 ®r2/r*»y.) 

( 71 \ / 7t v I ft IQ 

2 l*rl*)(2 l^rl^)-2 . 

tx=x • • Ir—1 I 

Theorem 2.5.3. (Triangle inequality) 

|x+y| < |x|+|y|. (2.5.3) 

Using Theorem 2.6.1, (iii) and (i), we obtain 

lx+y|^ = (x+y,x+y) = (x,x)+(y,y)+(x,y)4-(y,x) 

= |xP+|y|*4-(x,y)+(x^) 

< |x|2+|y|2+2|(x,y)|. 

The assertion now follows by Theorem 2.5.2. 

Theorem 2.5.3 may be restated in the form 
|x--y| < lx~-z| + |y-z|. 

If this inequality is interpreted for the case of real vectors of order 
2, it is seen to state that the sum of two sides of a triangle is not 
smaller than the third side. It is for this reason that (2.5.3) is called 
the ‘triangle inequality’. 

2.5.3. Definition 2.5.3. (i) A vector x is a unit vector if 
|x[ == 1. (ii) The process of replacing a non-zero vector x hy the unif 
vector |xl~ix is knovm as normalization of x. 

The next definition is suggested once again by analogy with 
elementary geometry. If the inner product of two real vectoM^of 
order 2 or 3 vanishes, then the corresponding directed segments are 
orthogonal to each other. We now extend the notion of orthogon-r 
ality to complex vectors of arbitrary order. 

Definition 2.5.4. The vectors x, y are orthogonal (to each 
other) if (X, y) == 0. 



n,§ 2.6 INNER PRODUCTS AND ORTHONORMAL BASES 66 

The wording of the definition suggests that orthogonality is a 
symmetrical relation. This is, in fact, the case since (x, y) = 0 
implies (y,x) = 0. 

Exercise 2.5.2. Show that if x is orthogonal to every vector of a basis 
of a vector space 23, then it is orthogonal to every vector of 23. 

Exercise 2.6.3. Show that orthogonality is not preserved by iso¬ 
morphism, i.e. if the vector spaces U, U' are isomorphic and x, y e U 
correspond to x', y' g U' respectively, then the relation (x, y) = 0 does not 
imply (x',y') = 0. 

Definition 2.5.5. (i) The vectors x^,..., Xj^form an orthogonal 
SET if they are orthogonal in pairs, (ii) The vectors Xi,..., Xj^form an 
ORTHONORMAL SET if they are orthogonal in pairs and each vector is 
a unit vector, 

If a basis of a vector space is an orthogonal (orthonormal) set, 
we call it an orthogonal {orthonormal) basis. 

From an orthogonal set of non-zero yectors we can at once obtain 
an orthonormal set by normalizing each vector. 

Theorem 2.5.4. If a number of non-zero vectors form an orthogonal 
set, then they are linearly independent. 

For let Xj^,..,,x^ be non-zero vectors such that 

(x^,xj = 0 (r,s= l,...,m; r # s). 

If are scalars such that 

•then, forming inner products with x^, we obtain 

Hence a^{x^,xf) = 0 and so = 0. Since this result holds for 
r = the theorem is proved. 

' Corollary 1. The vectors of an orthonormal set are linearly 
independent, 

Corollary 2. If n non-zero vectors of order n form an orthogonal 
set, then they constitute a basis of the total vector space 2S„. 

t It is sometimes more convenient to speak of ‘orthogonal vectors’ or ‘ ortho- 
normal vectors* when we refer to an orthogonal set or an orthonormal set of 
vectors. An orthonormal set of vectors consisting of a single vector is, of course, 
a unit vector. 


6682 


F 



66 VECTOR SPACES AND LINEAR MANIFOLDS II, § 2.6 

For, in view of the theorem just proved, the n vectors are linearly 
independent. Hence, by Corollary 1 to Theorem 2.3.3 (p. 53), 
they constitute a basis of SB^. 

Exercise 2.6.4. Suppose that the vectors Ui,...,u„e 93n form an 
orthonormal set. Show that any vector v e 33^ can be expressed in the form 

V = i (U<,V)Ui. 
i-i 

We shall next turn to the question of orthonormal bases. The 
total vector space 33^ clearly possesses at least one orthonormal 
basis, namely, (£. It is not, however, obvious whether 33,^ possesses 
an orthonormal basis containing a given set of orthonormal vectors, 
though geometrical intuition suggests that this is the case. Again, 
in dealing with a vector space other than a total vector space we 
do not as yet know whether an orthonormal basis need exist at all. 
These questions will now be settled. 

Theorem 2.5.5. (Theorem on orthonormal bases) 

Every non-null vector space 33 possesses an orthonormal basis. 
Moreover^ every orthonormal set of vectors in 33 is part of an ortho¬ 
normal basis of 33. 

It is instructive to note the resemblance between this result and 
the corollary to Theorem 2.3.6 (p. 54). 

We need only prove the second assertion since the first then 
follows as a trivial consequence. Write d(33) = r 1) and let 
an orthonormal set of vectors in 33, where 1 < < r.- 

By Corollary 1 to Theorem 2.5.4 and the corollary to Theorem 2.3.5 
there exists a vector xe33 such that are linearly 

independent. The vector y e 33, given by 

y = x-(Xi,x)Xi-...~(Xft,x)Xfc, 

is then non-zero and is evidently orthogonal to each of Xi,...,Xji.. 
Since y can be normalized we see that to every orthonormal set of 
h {<r) vectors in 33 we can add a further vector such that the 
augmented set is again orthonormal. 

The proof of the theorem is completed by a repeated applicqjtion 
of this result.! Forif5i,—>?mi®^^^^^o^or^®'lsetin 33andm = r, 
then the assertion is trivial, while if m < r, then we can successively 
find unit vectors such that §i,...,5„„ 5 ,^+ 1 ,...,is an 

orthonormal set, and therefore an orthonormal basis of 33. 


t Compare the proof of Theorem 2.3.5. 



n,§ 2.6 INNER PRODUCTS AND ORTHONORMAL BASES 67 

Exbbcisb 2.6.6. Give a geometrical interpretation for the construction 
of orthonormal bases in real spaces 932 Slj. 

The principle underlying the construction carried out in the proof 
above can be put even more'exphcitly. We are concerned with the 
problem of obtaining an orthogonal set of n vectors (in ®„) from 
any given set of n linearly independent vectors. The procedure by 
means of which this object is achieved—^known as Schmidt’s 
orthogonalizatim process —^will be used in the proof of the next 
theorem. 

Theorem 2.6.6. If x^,..., x„ are linearly independent vectors in 
95„, then scalars c^j (1 < j < i ^ n) can be found such that the 
vectors yi,...,yn> Qi'ven by the scheme 


= 

ya = CaiXi+Xa 
ys CaiXi+CaaXa+Xg 

form an orthogonal set of non-zero vectors. 


(2.6.4) 


Consider the vectors yi,...,y„ defined by the formulae 


yi = Xi 


y2 = X2 


yi = x3 


(yi.Xa) 

(yi.yi) 

(yi.Xa) 

(yi.yi) 


(y2,x3) 

(y2.y2) 


( 2 . 6 . 6 ) 


yn 


(yi>Xn) y (yn-l.Xn) 

(yi.yi) "■ (yn-i.yn-i) 


To show that these definitions have a meaning we must verify that 
each y*. ^ 0. This is done by induction. We clearly have y^ # 0. 
Assume now that, for some i > 2, yi,...,yj._i are all non-zero. Then 
the definition of y*. in (2.6.6) is significant and we recognize, further- 
«aore, that y* is a linear combination of Xj,..., x^ in which x* has the 
coefficient 1. Since Xi,...,X;^ are linearlyindependent it follows that 
y* 7 ^ 0. Thus yi ^ 0,...,y„ ^ 0. 

We next show, once again by induction, that yi,...,y„ form an 





68 


VECTOR SPACES AND LINEAR MANIFOLDS H, § 2.6 


orthogonal set. Assume that, for some k'^ 2, y^, 
orthogonal set. Now 


k-l 

Yk = Xfc- 2 


(Yi>^k) 
iYi. Yi) 


,Yk-i an 


and therefore, forming the inner product (y^, y*) where 1 < j < i, 
we obtain 


iYj,Yk) = (y^.x*)- 


k-l 

I 


(yi.Xfe) 
(yi. Yi) 


(Yj.Yi) 


“'’''■’‘‘'-(If! 

Thus yi,..., y„ form an orthogonal set. To complete the proof of the 
theorem we need merely to note that yj,..., y„, as defined by (2.6.5), 
can, in fact, be written in the form (2.6.4).f 


Exubcise 2.6.6. Let x,.x„ be linearly independent voetors in SBn, and 

suppose that the first r of them form an orthogonal set, If yj,..., y„ are the 
vectors defined by (2.5.5), show that yj^ = (k = l,...,r). 

2,5 A. In the discussion below U denotes a subspace of the total 
vector space 

Definition 2.5.6. The orthogonal complement of U is the 
set of all vectors in 33,^ which are orthogonal to every vector in U.f 

Exeboise 2.5.7. Show that the orthogonal complement of 93» is the null 
space, and that the orthogonal complement of the null space is 33^. 

Theorem 2.6.7. If U' is the orthogonal complement of U, then 
(i) U' is a vector space ; (ii) the orthogonal cmnplement of U' is U ; 
(iii) d(U)+d(U') = n. ‘ ’ 

The verification of (i) is immediate. To prove (ii) and (iii) write 
d(U) = r. The cases r = 0 and r = n have already been dealt with 
in Exercise 2.5.7 and we may, therefore, assume that 0 < r < n. 
Let {Xi,...,x,.} denote an orthonormal basis of U. By Theorem 2.5.5 
there exist vectors x,.+i,...,x^ g 33^ such that {Xi,...,x,., x,.+i,...,xj 
is an orthonormal basis of 33^. Every x e 33,^ is then expressible in 
the form ... 

X —. aiXi+...+a,.Xy+ay^iX,.+i+,..-fa^X^. 

t For some interesting applications of Schmidt's orthogonalization process tgi 
certain inhnitely-dimensional linear manifolds whose elements are functions see 
Jackson Fourier Series and Orthogonal Polynomials, 151-4, and Courant and 
Hilbert, Methods of Mathematical Physics, vol. i, chap, ii, §§ 1, 8, 9. 

t We do not know at this stage whether the orthogonal complement of U is a 
complement of U* This question wiU be settled below. 



n, §2.6 INNER PRODUCTS AND ORTHONORMAL BASES 69 
By Exercise 2.6.2 (p. 66) x e U' if and only if 
(Xi,x) = ... = (x„x) = 0, 


i.e. if and only if = ... 
is of the form „ _ 


= a, = 0. Thus X 6 U' if and only if it 

“r+l Xr+i+• • •+“n Xji" 


Therefore U'is the vector spacespannedbyx,+i,...,x„. Itfollowsby 
symmetry that the orthogonal complement of U' is spanned by 
Xi,...,x, and so is identical with U. Moreover, d(U') = n—r, and 
therefore ci!(U)+d(U') = n. 


Theorem 2.6.8. The orthogonal complement of Viisa complement 
ofU. 

Let W denote the orthogonal complement of U. If x e U, x e U', 
then (x, x) = 0 and so x = 0. Thus 0 is the only vector common to 
U and U'. Moreover, by Theorem 2.6.7 (iii), d(U)+<i!(U') = n. 
Hence, by Theorem 2.3.7 (p. 66), U and U' are complements. 


Exercise 2.5.8. Let U be a subspace of Show that, given any vector 

X e 33„, there exist unique vectors y, z 6 58^ such that 
X = y+z, y e U, (y,z) = 0. 

In our discussion of inner products we confined ourselves to 
vector spaces and did not attempt to consider linear manifolds. 
It should be pointed out, however, that the notion of inner product 
can be based on a set of abstract postulates and axiomatized in 
much the same way as the notions of vector addition and multiplica¬ 
tion by scalars. Since it is not essential for our purpose to pursue 
*this topic, wfe refer the reader elsewhere for details.f 


PROBLEMS ON CHAPTER II 

1. Show that tho set S of numbers of the form o+6V( —5), whore a and b 
are arbitrary rational numbers, is a field. Show also that S is not a field if 
a and b are allowed to take integral values only. 

2. Show that if a number of vectors span a vector space U, then they 
contain a basis of U. 

3. Let U be the set of all vectors {x,y) whose components satisfy the 
relations ax+by = 0yCX-\-dy — 0, where a, b, c, d are given numbers. Show 
Jhat U is a vector space and determine its dimensionality in terms of o, 6, c, d. 

4. Let Xi,..., be elements of a linear manifold. Lot /x be the 

maximum number of linearly independent elements among and 

V the maximum number of linearly independent elements among .y«. 

t See BirkhofE and MacLane, 3, 183-9, and Stoll, 7, 214-25. 



70 VECTOR SPACES AND LINEAR MANIFOLDS II 

Show that the maximum number of linearly independent elements among 
.. r„ does not exceed 

6. Let Yif^Yn be elements of a linear manifold and suppose 

that each F is a linear combination of the X*a. If /x denotes the maximum 
number of linearly independent elements among the X^s and v the maximum 
number of linearly independent elements among the F*s, show that v fx. 

6. Let m > 2, and let Xi,„»,X^ be any non-zero elements of a linear 
manifold. Show that these elements are linearly dependent if and only if at 
least one of them is a linear combination of the elements preceding it. Hence 
deduce Theorem 2.3.6 (p. 64). 

7. Show that a subset 111 of a linear manifold 211 is a submanifold if and 
only if it is closed with respect to multiplication by scalars and addition. 

8 . A subspace U of S 84 is spanned by the vectors (2, — 1,0,1), (6,1,4,— 6), 
(4,1,3, —4). Find an orthonormal basis of U. 

9. Show that the vectors (2, — 3,1), (0,1,2), (1,1, — 2) constitute a basis of 
933 , and find the vector representing (cx,j8,y) with respect to this basis. 

10. Prove that, with respect to suitable definitions of multiplication by 
scalars and addition, the set of all polynomials in t, of degree < 2, is a linear 
manifold 211 of dimensionality 3. When do three given quadratic polynomials 
Slip)* fiW* fzi^) constitute a basis of 211? 

Show that ^+1} is a basis of 211, and obtain the vector re¬ 

presenting 2^2—7^+3 with respect to this basis. 

11. Prove that, for any two complex vectors x and y (of the same order), 

|x+ylH|x-yl2 = 2|x|H2|y|^ 

Give a geometric interpretation of this result for the case of real vectors of 
order 2. 

12. Show that, for any orthogonal set of complex vectors Xi,...,X;(., 

I 12 k 

2 xJ = 2 lx,l*. 

Iv»l I v-1 

Give a geometric interpretation of this result for the case of two real vectors* 
of order 2. 

13. Let Xi,...,X;j. bo an orthonormal set of complex vectors. Show that, 

for any vector x, * 

S l(x,x,)l» < lxl>, 

with equality if and only if x is a linear combination of x^,..., x^.. (This is 
known as BesseVs inequality.) Hence deduce Theorem 2.6.2 (p. 63). 

Also deduce that, if {Xj,..., x„} is an orthonormal basis of 93,|, then, for 
any x e SBn, n 

s |(x,-x,)|* = |x|*. 

V-1 

14. By considering the expression |Ax-fy |* as a quadratic in A, prove that, 
for all real vectors x,y, 

|(x,y)l < |x| |y|. 

Extend the method of proof to the case of complex vectors. 



II 


PROBLEMS ON CHAPTER II 71 

15. Let {x^} be a sequence of vectors and x some vector in a vector space. 
If, as n -> 00 , |x„—x| 0, we say that x„ x. Prove the following results. 

(i) If x^ X, x,j x', then x = x'. 

(ii) If ->x, y„ -> y, then ax+jSy. 

(iii) If x,^ ^ X, y„ -> y, then (x,j,y„) (x,y). 

(iv) If |x,„—x„| 0 as m, n 00 , then there exists a vector x such that 

x« ->x. 

16. Show that if the subspaces U, H' of 33are orthogonal complements, 
then 0 is the only vector common to It and U'. 

17. Show that two subspaces U, U' of 23^ are complements if and only if 
0 is the only vector common to It and U', and every x e 23^ can bo expressed 
in the form x = y+y', whore y e It, y' e It'. 

18. Lot It, 23 bo two subspaces of 23„ and lot 23' bo a complement of 23. 
Show that, if 23 c U and the only vector common to It and 23' is 0, then 
U = 

19. Let It, 23 be subspacos of 23n and let It', 23' be their orthogonal 
complements. Show that It c 23 implies 23' c It' and that It c 23' implies 
23 cU'. 

20. A subspace It of 233 consists of all vectors (Xi,X 2 ,X 2 ) which satisfy the 
relations Zx^-{-x^--x^ -- 0, Xi—5x2+x^ = 0. Determine a basis of the 
orthogonal complement of It. 

21. The subspace U of 234 is spanned by the vectors (1,0,— 1,2) and 
( — 1,1,1,0). Show that the orthogonal complement U' of It is the set of 
vectors of the form (a—2j3, — 2j3,a,/3), whore a,j3 are arbitrary numbers. 
Also find an orthonormal basis of U'. 

22. Let Ui,..., Ufc bo subspaces of the total vector space 23n» and suppose 

that every x e 23n can be expressed in the form x — + where 

yi ^ ^i»**->yfc e lt;fc. Show that there exists a basis of 23„ every vector of 
which belongs to some U^. Deduce that d(lti)-i-...+d(lt;fc) > n. 

23. Lot 11,23 be two subspaces of 23„. Let 11 + 23 denote the set of all 
vectors in 23n expressible in tho form x+y, where x g It, y 6 23; and let 
U n 23 denote*the set of all vectors common to It and 23. Show that It+23 
and U n 23 are vector spaces, and that 

d(U+5B)+d(U n 23) = d(ll)+d(23). 

What becomes of this result when U, 23 are complements ? 



Ill 


THE ALGEBRA OF MATRICES 

The algebra of matrices was first developed systematically by 
Cayley in a series of papers which began to appear in 1857, and most 
of the results derived in the present book were discovered during 
the second half of the last century. Our aim in this chapter is to 
consider in some detail the elementary notions in the theory of 
matrices. 

3.1. Elementary algebra 

Mathematicians often find themselves compelled to introduce 
new types of entities and to define operations applicable to these 
entities. It then becomes a matter of considerable importance to 
determine to what extent the operations so defined resemble the 
famiUar operations of elementary algebra. Thus, in the previous 
chapter we introduced certain objects called Vectors’ and, having 
defined two operations (addition and multiplication by scalars) to 
which vectors could be subjected, we proceeded to study the formal 
nature of these operations. A similar programme is to be carried 
out in the present chapter with respect to matrices, whose definition 
we defer till the next section. There we shall also define operationi^ 
applicable to matrices and shall investigate to what extent the 
resulting matrix laws reflect the structure of the number system. 
It will, therefore, be useful at the present stage to enumerate 
briefly the basic algebraic laws valid for real and complex numbers. 

We recall, in the first place, that any two numbers a and b are 
either equal (a = 6) or unequal {a ^ b). 

The fundamental operations in number algebra are those of 
addition and multiplication, both operations being applicable to 
pairs of numbers. The result of adding a and 6 is a number which 
is called their sum and is denoted by a+6; the result of multiplying 
a and 6 is a number which is called their product and is denoted 
by a.6 or ab. Addition and multiplication are both commutative: • 

a-1-6 = oh = 6a. 

In other words, it is immaterial in which order the terms of a sum 



ELEMENTARY ALGEBRA 


73 


ni, § 3.1 

or the factors of product are taken. Addition and multipKcation 
are, furthermore, both associative: 

a+( 6 +c) = («+ 6 )+c, (3.1.1) 

a{bc) = (ab)c. ( 3 . 1 . 2 ) 

These results imply that we may, without danger of ambiguity, 
remove the brackets and write simply a+ 6 +c for either side of 
(3.1.1) and abc for either side of (3. 1 . 2 ). Hence we may also remove 
brackets from expressions such as 

{a+( 6 +c)}+d or {{ab)(cd)}e. 

The operations of addition and multiplication are involved 
together in the distributive law: 

a{b-\-c) = ab-\-ac. 

The number system contains two numbers, 1 (unity) and 0 (zero), 
which have very special properties. Thus, for all a, 

a+O = a, Oa = 0 , la = a. 

Again, if ab = 0 , then at least one of a, b is equal to 0 . This result 
is known as the division law\ and we often express it by saying that 
the set of real (or complex) numbers has no divisors of zero.f A 
consequence of this fact is the cancellation law which states that if 
ax = ay and a 7 ^ 0 , then x — y. 

The equation a-\-x — b possesses a unique solution for x\ this 
solution is denoted by a; = 6 —a; in particular we write —a for 
0 —-a. The number b—a is called the difference of b and a, and the 
•operation of,forming differences is known as subtraction. Of the 
algebraic laws involving subtraction we may mention the following: 

a—a = 0, (—1)^3^ = —CL, 

a{’-’b) = —a 6 , a{b—c) — ab—ac. 

If a 7 ^: 0 , the equation ax ~ \ has a unique solution for x, which 

is denoted hy x = - or 1 /a or a-^. The number a-^ is called the 
a 

inverse or reciprocal of a. For a 7 ^ 0 the equation ax = b has the 

unique solution x = a-^b which is usually denoted by - or 6 /a. The 

a 

number 6 /a is called the quotient of 6 by a, and the operation of 
t Cf. Definition 3.6.3 (p. 96). 



THE ALGEBRA OF MATRICES 


74 


III,§ 3.1 


forming quotients is known as division. The following identities are 
typical of thoi^e involving division: 


V2-U 
1 / 1 “^ (4 


When r is a positive integer, oT is defined as a.a ... a (r factors). 
When a 0 and r is a negative integer, a'* is defined as 
Moreover, for every a, we put a° = 1. With these conventions we 
have the following index laws, valid for all integers r, s and all 
numbers a ^ 0: 

aW = a»-+», (a^y = a". 


3.2. Preliminary notions concerning matrices 

In this section we shall collect together for convenience a number 
of definitions relating to matrices and explain the notation that 
is to be used subsequently. 

The context in which matrices arise most naturally is that of 
linear substitutions. A linear substitution is a system of relations 
between two sets of variables, say x^,...,x^ and j/i,..., y^, of the form 




Vm = 


(3.2.1) 


The reader will readily call to mind situations where such systems 
of relations make their appearance. We may conveniently charac¬ 
terize the system of relations (3.2.1) by isolating the array of 
coefficients This point of view, first adopted by Cayley, leads 
at once to the notion of a matrix. 


Definition 3.2.1. (i) An mxn matrix {or a matrix of type 
mxn) A over the field g is a rectangular array of numbers in gj 
consisting of m rows and n columns, say 

1^11 ®i2 • • • 

®21 ^22 • • • ^ 2 n 


V®ml ®m2 • • • ®mn/ 


A = 


(3.2.2) 





Ill, §3.2 PRELIMINARY NOTIONS CONCERNING MATRICES 75 

(ii) The mn numbers aig,..., dmn elements of the 

matrix A. 

The term ‘matrix’ is due to Sylvester (1850). Matrices will 
normally be denoted by capital letters printed in bold-face type.f 
The element standing in the ith row and jth column-^ften called 
the (i, j)th element—of a matrix A will be denoted by A^^.. If, as 
in (3.2.2), A^-y = a^y, we shall write A = (a^y). It is important to 
remember that the first suffix of an element indicates the row and 
the second the column of that element. 

Definition 3.2.1 (i) implies that two matrices of the same type 
are equal if and only if all their corresponding elements are equal. 

In future we shall not, as a rule, mention the reference field gf 
explicitly but shall assume tacitly that such a field has been given. 
If g is the real (complex) field, a matrix A over g is called a real 
(complex) matrix. 

Definition 3.2.2. (i) A square matrix of order n is a matrix 
of type nxn. 

(ii) The elements square matrix 



®12 

• 

• Ol» 


®22 • 

• 

• ®2n 

Uni 

®n2 • 

, 

• ®nn; 


constitute its diagonal, and are called diagonal elements. 

The reader is, in effect, already familiar with square matrices 
from the discussion of determinants. 

When we wish to emphasize that we are not necessarily dealing 
with square matrices we speak of rectangular matrices. 

' Definition 3.2.3. The zero matrix {of type mxn) is themxn 
matrix all of whose elements are equal to zero. It is denoted by or, 
when no confusion is likely, by O. 

• Definition 3.2.4. The unit matrix of order n {denoted by 
or simply by I) is the square matrix of order n whose diagonal elements 
are equul to 1 and whose remaining elements are all equal to 0. 


t The exceptions to this convention are explained after Definition 3.2.8. 




76 


THE ALGEBRA OF MATRICES 


m, § 3.2 


Thus 



a 

0 

0 . . 

• 

In = 

0 

1 

0 . . 

. 0 

.0 

0 

0 . . 

• i> 




where is the Kronecker delta. We note that the linear substitu¬ 
tion characterized by has the form 1/2 = == ^n* 

This is known as the identical substitution. 


Definition 3.2.5. (i) A diagonal matrix is a square matrix all 
of whose elements outside the diagonal are equal to zero, (ii) A scalar 
MATRIX is a diagonal matrix all of whose diagonal elements are equal 
to one another. 


Thus a diagonal matrix of order n has the form 



0 . . 

. O' 

0 

0^2 . . 

. 0 

.0 

0 . . 

• «n> 


we shall denote this matrix by dg(ai, agj—> ^ scalar matrix has 

the form dg(a, a,..., a). Particular instances of scalar matrices are 
the unit matrix and the square zero matrix. We shall presently see 
that scalar matrices behave essentially like scalars. 

Definition 3.2.6. An upper (lower) triangular matrix is a 
square matrix all of whose elements below {above) the diagonal are 
equal to zero. A triangular matrix is one that is either upper 
triangular or lower triangular. 

If a matrix is both upper triangular and lower triangular, it is 
clearly diagonal. 

Definition 3.2.7. The transpose of a rmtrix A is the matrix 
obtained from A by the interchange of rows and columns. The 
operation of deriving from A is known as transposition. 

Thus, if 


A = 



. 

• • ®ln \ 

/On . 

• • ^ml 

. 

.... 

then A®" = j . 

• • • 

. 

• • ®mn/ 

\oin . 

• * 


i.e. if A = {a^fj, then A^ = (6^^), where = a^^ {i = l,...,n* 
j = 1,..., m). The transpose of an m X n matrix is an 71 x m matrix. 
The transpose of an upper triangular matrix is a lower triangular 
matrix, and conversely. 




m, § 3.2 PRELIMINARY NOTIONS CONCERNING MATRICES 77 

Clearly {APy = A, and so the relation between A and AP is 
symmetrical, either matrix being the transpose of the other. 
Dbetnition 3.2.8. A column matrix is a matrix of type »X 1, 

" a] 


A BOW MATRIX is a matrix of type Ixn, say 

x^ ... x^Y 

Matrices of both these types may be regarded as vectorsf and 
referred to respectively as column vectors and row vectors, A matrix 
may thus be thought of as a generalization of a vector. 

In the majority of cases it is more convenient to use column 
vectors in preference to row vectors. We shall, accordingly, adopt 
the convention that, unless the contrary is stated, all vectors are 
understood to be column vectors. In particular, the vectors ei,...,e^ 
defined by (2.1.2) on p. 43 will from now on be regarded as column 
vectors. 

We shall continue to denote vectors by small letters in bold-face 
type. Suppose, for instance, that 


X = 


(3.2.3) 


\^n/ 


The transpose of a column vector is a row vector, and we shall then 


For typographical reasons it is generally preferable to avoid the form 
(3.2,3) and to write instead 


X = 

The notion of row vectors and column vectors enables us to 
Introduce a very convenient notation connected with matrices. 

t Strictly speaking this assertion will have been justified only after the opera- 
tiona defined for vectors have been extended to matrices in § 3.3. It will then 
also be seen that 1x1 matrices can, in fact, be treated as scalars. We shall make 
use of this identification whenever it is convenient to do so. 



78 


THE ALGEBKA OF MATRICES 


in, § 3.2 

Dbpinitiok 3.2.9. The i-th row of a matrix A {regarded as a row 
vector) is denoted by ; thej-th column of A {regarded as a column 
vector) is denoted by A*y. 

Exercise 3.2.1. Establish the identities 

3.3. Addition and multiplication of matrices 

3.3.1. Matrix algebraf is based on four operations—multipKca- 
tion by scalars, addition, matrix multiplication, and transposition. 
The first three of these will be discussed in the present section. It 
should be noted that the definitions of multiplication by scalars 
and of addition given below coincide with the previous definitions 
(Definitions 2.1.4 and 2.1.6) for the special case when the matrices 
are vectors. 

Definition 3.3.1. The product aA( or Aa) of an mxn matrix A 
by a scalar a is a matrix defined by the relations 

{ocA)ij = (Aa)^y = aA^y (i = l,...,m; j == l,...,n). 

We obtain at once a number of obvious results, such as 

1A = A, 

(^l)A = {-l)(a,j) = 

OA = O, 
aO = O, 

{ap)A = a(i3A), 

al = dg(a,...,a). 

The last result shows that a scalar matrix is a scalar multiple of the; 
unit matrix. 

Definitions 3.3.2. Let A, B be matrices of the same type, say of 
type mxn. Then their sum A+B is a matrix defined by the relations 
(A+B)f^ = (i = j = l,...,n). 

Thus, if A = (oy), B = {b^j), then A+B = {a^J+b^j). For 
instance 

/2 3 + ~1 1\ = /2 2 0\. 

[o 1 2j \2 0 3/ \2 1 6/ 

When A and B are not of the same type, A+B is not defined. , 

t Matrix algebra can be studied at different levels of sophistication. An 
elementary exposition will be found, for instance, in Durell and Robson, 9, chap, 
xvii. For treatment from an advanced point of view, on the other hand, see Mao 
Duffee, 23, chap, vii, or Albert, 24, chaps. 



in, §3.3 ADDITION AND MULTIPLICATION OF MATRICES 79 

We note that we can now form linear combinations of matrices, 
i.e. expressions of the form 

— A;j., 

provided only that Ai,..., A;^ are of the same type. 

The usual algebraic laws relating to the two operations so far 
introduced follow at once from the definitions.f Some of these are 
incorporated in the next theorem. 

Theorem 3.3.1. Matrix addition is commutative and associative^ 

i.e. 

A-f-B = B-j“A, 

A+(B+C) = (A+B)+C. 

Furthermore, matrix addition and multiplication by scalars are 
connected by thefollovnng distributive laws: 

a(A+B) = otA+aB, 

(a+i3)A = aA+j8A. 

The associative law implies, in particular, that when we have an 
expression involving only addition of matrices and containing 
brackets, all such brackets can be removed without danger of 
ambiguity. 

There is no difficulty in introducing subtraction of matrices. By 
Definition 3.3.2 it is obvious that if A, B are matrices of the same 
type, the equation A+X = B possesses a unique solution for X; 
*this solution is denoted by X = B—A and is called the difference 
of B and A. In fact, if A = (a^j), B = (6^^), then X = (6^^— 

For O—A we write —A. It is easy to see that subtraction satisfies 
all the usual rules. We have, for instance, the following identities. 

A-A==0, (-1)A=-A, -~(~A) = A, 

a(A—B) = aA—aB, (a—j8)A = aA—jSA. 

Exebcise 3.3.1. Verify that the scalar matrices of order n form a one- 
dimensional linear manifcdd ^ and that the scalars form a linear manifold 
W'. Prove that m - W. 

t The reader is advised to write out in detail the proofs of the results given 
below and to observe how the various laws relating to matrices are derived from 
analogous laws of number algebra. 



80 


THE ALGEBBA OP MATBICES 


in, § 3.3 

Exercise 3.3.2. Show that the set of all mxn matrices over JJf is 
a linear manifold over considering the mn matrices of whose 

elements consist of a single 1 and mn— 1 zeros, or by using the corollary to 
Theorem 2.4.2 (p. 69), show that = mn. 

3.3.2. Matrix addition and multiplication of matrices by scalars 
are obvious extensions of the corresponding ideas for vectors. In 
introducing next the idea of matrix multiplication, however, we 
break fresh ground. 

Definition 3.3.3. Suppose that A is an Ixm matrix and B an 
mxn matrix y so that the number of columns of A is equal to the number 
of rows of B. Then the matrix product AB is thelxn matrix defined 
by the relations 

m 

(AB)^j- = 2 “ 1,...,Z; j = l,...,n). 

A;=l 

In other words, the (i, j)th element of AB is obtained by multi¬ 
plying together corresponding elements in the ith row of A and 
the jth column of B and then adding the products. 

Exercise 3.3.3. Show that, if the product AB exists, then 

(AB)ij = 

We note that AB, when it exists, has as many rows as A and as 
many columns as B. Thus, for instance, the product of a 4x2 
matrix and a 2 x 3 matrix (in that order) is a 4 x 3 matrix, e.g. 


2 

r 


' 6 

—2 

-r 

1 

3 

/I 1 -2\ ^ 

8 

— 13 

11 

1 

0 

\3 -4 3/ 

1 

1 

—2 

5 

2> 


111 

-3 



If the number of columns of A is not equal to the number of rows 
of B, then AB is not defined. 

The two products AB and BA are quite distinct entities and, 
indeed, one of them may exist whereas the other does not. The 
condition for both AB and BA to exist is that if A is of type m x w, 
B should be of type nxm. In that case AB, BA are of type mxm, 
nxn respectively, and consequently the two products cannot even 
be compared unless m = n. The question whether AB and BA are 

t We could have begun our discussion of matrices by proving this result as 
soon as the operations of addition and multiplication by scalars had been defined. 
It would then have been unnecessary to verify successively all the various laws 
relating to these two operations since these laws would have been known to hold 
by virtue of the results derived in § 2.2. 



Ill, §3.3 ADDITION AND MULTIPLICATION OF MATRICES 81 

equal only arises, therefore, when A and B are square matrices of 
the same order. Analogy with elementary algebra may lead us to 
expect that in this case the two products are necessarily equal, but 
this is not in fact so. 


Theorem 3.3.2. Matrix multiplication is non-commutativc, i.e. 
the equation AB = BA need not be satisfied even when both AB and 
BA exist and are of the same type. 

To establish this negative conclusion we need only construct a 
single example where AB BA. In fact, almost any two square 
matrices taken at random satisfy this requirement. Thus, taking 



In view of Theorem 3.3.2 it is essential when referring to a matrix 
multiplication to state unambiguously the order in which the 
factors are taken. 


Definition 3.3.4. The matrix A is said to premultiply B and 
B is said to postmultiply A in the product AB. 

Definition 3.3.5. Two matrices A, B commute {with each other), 
or are commuting matrices, if AB = BA. 

Exercise 3.3.4. Let A be a diagonal matrix whose diagonal elements are 
distinct. Show that if A commutes with A, then A is also a diagonal matrix. 

Although multiplication of matrices differs from that of numbers 
in that it does not obey the commutative law, in most other respects 
the two operations have similar properties. 

Theorem 3.3.3. Matrix multiplication is associative, i,e. 

A(BC) = {AB)C, (3.3.1) 

provided that either side of (3.3.1) is defined. 

Let the matrices A, B, G be of type pxq,rxs,txu respectively. 
Then BC exists and is of type rxu provided that s = t. In that case 
A(BC) exists and is of type pxu provided that q = r. Thus 
A(BC) exists if and only if q = r,s = t, and in that case it is of type 
pxu. In exactly the same way we see that the same is true of 
(AB)C. Hence the existence of either side of (3.3.1) implies the 
existence of the other side, and the matrices on the two sides are 

G 


5582 



82 


THE ALGEBRA OF MATRICES 


ra, § 3.3 

then of the same tjrpe. It now remains to prove that they are 
equal. Writing A = {a^j) and so on, we have, in view of the 
associative and the distributive laws for numbers, 

where k, I run over the appropriate ranges of values.f Similarly 
{(AB)C}y = ^(AB)aCy = 
and the proof is therefore complete. 

The result just proved shows that the matrix product 
A(BC) = (AB)C 

can be written unambiguously as ABC. 

The associative law extends immediately to any number of 
factors. Thus we have, for instance, 

(AB)(CD) = A(BC)D = (ABC)D = A(BCD), 

provided that any one of these expressions is defined. In fact, we 
can write ABCD for the product in question. Quite generally we 
may say that in for ming products of matrices we need only pay 
attention to the order of the factors but not to the way in which 
they are bracketed. 

We saw above that 

(ABC)^^ = ^AjjjBj.jCy. 

Analogous formulae hold for products of more than three factors; 
(ABCD),, = y A,,B«C„„D„,. 

kthm 

Exebcise 3.3.5. If Ai,...,Ajfc are matrices of type xnjj. 

respectively, what is the condition for the product A^... Ajj. to exist; and, 
when it exists, what is its type ? 

A matrix can be multiplied by itself if and only if it is a square 
matrix, and in that case the index notation can be conveniently 
employed. 

Definition 3.3.6. If A is a square matrix of order n, then ‘ 
A® = A** = A^-iA (r > 1). 


t In fact, k ^ I, 2,..., q; I ^ 1, 2,...,«. 



Ill, §3.3 ADDITION AND MULTIPLICATION OF MATRICES 83 
In virtue of the associative law we have, in fact, 

Ai = A, A2 = AA, A3 = AAA, A^ = AAAA, 
and so on. It is now an easy maater to verify the index laws 

A^’A* = A**-^*, (Ary = 

for all non-negative integers, r, s. 

The zero matrix and the unit matrix play a particularly interest¬ 
ing part in matrix algebra. If A is any mxn matrix, then clearly 

A+Ol = A, 

AO; = o;,, Of A = Of 

Moreover, since is the nxn matrix (S^j), we have 

n 

k^l 

Hence AI,^ = A, (3.3.2) 

and similarly A = A. (3.3.3) 

These results show that in matrix algebra the zero matrix and 
the unit matrix play roles corresponding to those of the numbers 
0 and 1 in elementary algebra. 

We may also note that, in view of (3.3.2) and (3.3.3), multiplica¬ 
tion by scalars can be interpreted as matrix multiplication, for we 

«A = (al JA = A(«I J. 


^ Theorem 3.3.4. Matrix multiplication is distribviive with respect 
to addition, i.e. 


A(B+C) = AB+AC, 
provided that either side of (3.3.4) is defined; and 


(3.3.4) 


(B+C)A = BA+CA, (3.3.6) 

provided that either side of (3.3.6) is defined. 


We confine ourselves to the proof of the first identity, since the 
second can be dealt with in exactly the same manner. Let A, B, C 
be matrices of type pxq,rxs,txu respectively. It is easily seen 
tiiat either side of (3.3.4) is defined if and only i£q = r = t,s = u, 
and that in that case both sides are pxu matrices. It remains, 
therefore, to show that the corresponding elements in the two 
matrices are equal. We write A = (o^^), and so on. Making use of 



84 THE ALGEBRA OF MATRICES HI, § 3.3 

Definitions 3.3.3 and 3.3.2 and the distributive law for numbers, 
we obtain 

= (AB)f^+(AC)^^ = (AB+AC)^^. 

The proof is therefore complete. 


The distributive law can at once be extended to more complicated 
expressions. For example, we have 

(A+B)(C+D+E) = AC+AD+AE+BC+BD+BE. 

Exekcise 3.3.6. Show that the set of all matrices which commute with 
a given matrix is closed with respect to addition and multiplication. 


3.3.3, The results contained in the next two theorems are 
almost trivial. Thus, for instance, Theorem 3.3.5 (i) merely states 
that the ith row of AB can be obtained by premultipl 3 dng B by the 
ith row of A. Nevertheless identities such as these are useful since 
they help to reduce the manipulation of matrices to a purely 
mechanical procedure. 


TnEOREiM 3.3.5. / a tj\ _ a u 

(i) (AB)i* = A^* B; 

(ii) (AB)*^ = AB*^; 

(iii) (ABC)j^ = BG,|,j. 

By Exercise 3.3.3 (p. 80) we have 

(AB)^y = A^* B,„^. (3.3.6) 

Hence (A^,i, B)y = (A^,|,)j,i( B,|,^ = A^,,, B,^^ = {AB)^^. 

Moreover {(AB)^,^}^ = (AB)^^, 

and so, for all j, {(AB)j*}„ = (A^* B)y. 

Since both sides of (i) are row vectors, this identity is now proved. 
Identity (ii) is proved similarly. Again, using (3.3.6) and (i), we 

(ABC),^ = (AB),*C*, = A**BC*,. 

Theorem 3.3.6. If A is an mxn matrix, and x = (x^ . *n)^» 

y = (2 /i,- 3 3/m)^3 then 


Ax — A,ljl-[" 

y^A = yi Aju,!*. 



m. §3.3 ADDITION AND MULTIPLICATION OP MATRICES 86 
Let A = Then 







• 


• 






/an a:i+-.+ain xA 

= 1.I = Ax, 

and the second assertion is established similarly. 

Exercise 3.3.7. Show that tho inner product (x,y) (as given by Defini¬ 
tion 2.6.1, p. 62) satisfies the identity (x,y) = X^y = y^X.f 

Exercise 3.3.8. Let A be an nxn matrix such that Ax = 0 for all 
vectors x (of order n). By taking, in turn, x = ei,...,x = e,^, show that 

A = O. 

Exercise 3.3.9. Show that, if A is an n x w matrix, then 
ef Ae^- = A|j (%j = l,...,n). 


3.4. Application of matrix technique to linear substitutions 

We have now carried the discussion of matrix algebra a fairly 
long way without, as yet, having shown much need for the introduc¬ 
tion and development of the notion of a matrix. We know, indeed, 
that the mxn matrix A == (a^^) provides a convenient means for 
specifying the system of relations 




Vm — 


(3.4.1) 


It is easily seen, moreover, that (3.4.1) can be written as a single 
matrix equation 


Ax, 


(3.4.2) 


where x = (^Pi,.-,^n)^> Y = Nevertheless, it is not in 

the first place obvious that the introduction of matrices gives us 
anything more than a contracted notation for systems of relations 
such as (3.4.1). However, we are now in a position to demonstrate 
that the advantages arising from the use of matrices are not purely 
ilotational and that matrix technique enables us to handle systems 
of the type of (3.4.1) rapidly and efficiently. 


t If X = (a?!,..., then x denotes the vector ; and similarly for 

column vectors. 




86 


THE ALGEBRA OF MATRICES 


III, § 3.4 


Suppose, for example, that 

Vl “ 1 

2/2 = ®21^iH“®22^24"®23^3 J 

and 2Ji = ^112/1 "t" ^12 2/2 

2^2 = *212/1+6222/2 , 

^3 = *312/1+ *32 2/2 
^4 = *4i2/i+*42 2/2 - 


( 3 . 4 . 3 ) 


( 3 . 4 . 4 ) 


Substituting ( 3 . 4 . 3 ) in ( 3 . 4 . 4 ) we obtain 

Zi = ^2+^13 ^3)+ *12(^21^1+^22 ^2+^23 ^3) 

^2 = *21(^11^1+^12^2+^13^3)+*22(^21^1+^22 ^2+^23 ^3) 
^3 = *3l(®lli*^l+^12 ^2+^13 ^3)+*32(^21 ^1+^22 ^2 + ^23 ^3) 
^4 = *4l(%l ^1+%2 ^2+®13 ^3)+*42(^21^1+^22^2+^23^3)- 


Thus 


= (*ll®ll + *12®2l)^l+(*ll%2 + *12®22)^2+(*ll®13 + *12®23)^3 
^2 = (*21 %1 + *22 ^21)^1+(*21^12 +*22 ®22)^2+(*21%3 +*22^23)^3 
^3 = (*31^11+*32®2l)^l + (*31®12+*32 ^22)^2 +(*31®13+*32 ^23)^3 
^4 = (*41®ll+*42®2l)^l+(*41 %2+*42 ^22)^2 + (*41 ^13+*42 ^23)^3 J 

( 3 . 4 . 5 ) 


The relation between the a;’s and the g’s is therefore specified by the 
matrix 

*11®11+*12^21 *11%2+*12^22 * 11%3 + * 12 ^ 23 ^ 

*21 %1+*22®21 *21^12+*22^22 * 21^13 + * 22 ^ 23 l 

*31%1+*32®21 *31%2+*32®22 *31 ^13+*32®23 

(*41 ^11+*42 ®21 *41 ^12+*42 ^22 *41 ®13+*42 ^23> 

which is seen to be equal to the matrix product 

^*11 *12^ 

*21 *22 
*31 *32 

1*41 *42i 


^ttll a^g ai 3 \ _ 

\fl2i ^22 ^ 23 / 


where A = {a^j), B = (b^j). 

The result just obtained has been arrived at after a certain 
amount of computation, and the work would naturally be even 
heavier if a larger number of variables were involved. However, 
the conclusion that BA is the matrix specif3dng the relation between 
the x*& and z'& can be reached much more rapidly as follows. 



m,§3.4 APPLICATION OF MATBIX TECHNIQUE 87 

Put X = ^3)^> y ~ ^ ~ (^i>2f2j24)^. 

Then (3.4.3) and (3.4.4) can be written in the form 
y = Ax, z = By. 

Hence z B(Ax), 

and so, by the associative law, we have 

z = (BA)x, 

which is equivalent to (3.4.6). 

The rapidity of the process demonstrates the superiority of 
matrix technique over straightforward calculation. Matrices are 
thus seen to be an effective tool for manipulating linear substitu¬ 
tions, The principal reason for their effectiveness lies in the appro¬ 
priateness of the definition of a matrix product; we can now see 
that this has been defined in precisely such a way that the resultant 
of two successive substitutions specified respectively by the matrices 
A and B is itself specified by BA. 

A matrix was originally defined as a rectangular array of 
numbers. We see now that it is convenient to think of a matrix 
as a ‘transformation of vectors*. Thus the equation (3.4.2) associ¬ 
ates with every vector x of order n a certain vector y of order 771 , 
and the mode of associating these pairs of vectors is represented 
by the matrix A. The idea of a transformation of vectors points to 
fresh possibilities inherent in the concept of a matrix. We shall not, 
for the moment, pursue this topic but return to it in Chapter IV. 

Exercise 3.4.1. Let x range over a vector space of order n and let A be 
a fixed m X n matrix. Show that Ax ranges over a vector space of order m. 

3.5. Adjugate matrices 

In this section all matrices are assumed to be square matrices of 
order n. 

Definition 3.5.1. (i) Let A = (a^^). The determinant is 
known as the determinant of A, and is denoted hy \A\ or det A. 

(ii) The matrix A is singular or non-singular according as 
|A| = 0 or |A| ^ 0. 

^Exercise 3.5.1. Show that |oA| = a”|A|. 

Theorem 3.6.1. |AB| = 1A1.|B| = |BA|. 

In other words, the determinant of a matrix product is equal to 
the product of the determinants of the matrix factors. We note, 



THE ALGEBKA OF MATRICES 


88 


in, § 3.6 


in particular, that although AB and BA are generally distinct, 
their determinants are equal. 

Theorem 3.5.1 is an immediate corollary of the multiplication 
theorem for determinants (Theorem 1.3.1, p. 12). For, writing 
A = B = {bij), AB = (%), we have 

n 

2 = % (m = 

k=l 

and so Kin-= l%L 

i.e. |A|.|B1 = |AB1. 

Interchanging A and B we obtain |B|.|A| = [BA], and the 
theorem is therefore proved. 


Theorem 3.5.1 extends at once to any number of factors. In 
particular we have, for any non-negative integer k, 

lA^I = lAl^ 

Our next definition makes use of cofactors. These are defined as 
in the theory of determinants (Definition 1.4.1, p. 14). 

Definition 3.5.2. The adjugate matrix A* of A is the trans¬ 
pose of the matrix of cofactors of the elements of A. 

In other words. A* = (A^^)^, where A = (a^fj and denotes 
the cofactor of the element in A. 

Exebcisb 3.5.2. Show that (aA)* = 

Theorem 3.5.2. AA* = A*A = |A|L 

Let A = (a^y). A* = (6^y), so that 6^y = Ay^. Then, using 
Theorem 1.4.3 (p. 20), we obtain, for i,j = l,...,7i, 

(AA*)^y = 2 = 2 ^ik^jk = |A|S^y, 

A;«l *=1 

== 2 = lA|S^i. 

fc=l fc=l 

These relations are equivalent to the assertion. 


It is interesting to recall at this stage the result of Theorem 1.6.2 
(p. 24). In the language of matrices this states that 

1A*1 = |Al«-i. (3.6.1) 

The proof given in § 1.5 may now be formulated as follows. By 
Theorem 3.6,2 we have 

1AA*1 = |(detA)I|. 



Ill, §3.5 AD JUGATE MATRICES 89 

Hence, by Theorem 3.5.1 and Exercise 3.5.1, 

|Ar|I| 

= |Ar, (3.6.2) 

and (3.6.1) now follows by virtue of Theorem 1.5.1. 

In § 1.6.3 we gave an alternative proof of (3.5.1) which was 
independent of Theorem 1.5.1 We shall now sketch yet another 
proof, again independent of Theorem 1.5.1, and employing a useful 


new device. Let A 


and consider the matrix 



«12 

^In 

A—xI = 

®21 

®22 ^ 

®2n 



®n2 

• ^nn 


The determinant | A—| is obviously a polynomial in x with leading 
term Hence, for all sufficiently large values of x (say for 

X > Xq) |A—:rl| ^ 0. Again, each element of (A—rrl)* is a poly¬ 
nomial in a:, and so |(A—a;I)*| is a polynomial in x. 

Rewriting (3.5.2) with A—xl in place of A we have 

|A—a:I|. |(A—a;I)*| = |A— 

Hence, for x > Xq, 

|(A-a;I)*l = \h-xl\^-^. 

Here both sides are polynomials in x and since they are equal for 
X > they are equal for all values ofa;, by the corollary to Theorem 
1.6.3 (p. 31). In particular, the two sides are equal for a; = 0, and 
the identity (3.5.1) is therefore proved. 

Theorem 3.5.3. (A*)* = 

Forn = 2thefactor |A|’*^”^isinterpretedas 1, nomatterwhether 
A is singular or not. In that case the proof is trivial and we shall 
at once assume that n. > 2. We write, for simplicity. A** for (A*)*. 
Applying Theorem 3.5.2 with A* in place of A and using (3.5.1) we 
obtain A*A** = |A*|I = |A|»-iI. 

Hence (AA*)A** = |A|”“^A, 

’and so,'again by Theorem 3.5.2, 

|A|A** = lAl"-iA. 

|A|(A**)f^ = {i,j = 


Thus 




90 


THE ALGEBRA OF MATRICES 


III, § 3.5 

Both sides in this relation are polynomials in the elements of A, 
and since |A1 does not vanish identically it follows, by Theorem 
1.6.1, that 

(A*% = |Ar=>A,^ 

This is equivalent to the theorem. Just as in the case of the identity 
(3.5.1) so here, too, an alternative proof depending on the considera¬ 
tion of the matrix A—a;I can easily be constructed. 

Exebcise 3.5.3. Write out such a proof in detail. 

Theorem 3.6.4. (AB)* = B*A*. 

Making use of Theorems 3.5.1 and 3.5.2 we obtain 

|A|.|B1B*A* = |AB|I(B*A*) = (AB)*(AB)(B*A*) 

= (AB)*A(BB*)A* = (AB)*A|B|A* 

= |A|.|Bl(ABr. (3.5.3) 

Hence |A|. 1B|(B*A*),^ = |A|. |B|{(AB)% (t,i = l,...,n). 

All the expressions involved here are polynomials in the 
elements of A and B. Since, moreover, |A|.|B| does not vanish 
identically, it follows by Theorem 1.5.1 that' 

. = {(AB)% (i,j = l,...,n). 

and this is, in fact, our assertion. 

An alternative argument of the now familiar kind runs as follows. 
Rewriting (3.5.3) with A—a:I, B—*1 in place of A, B respectively, 
we have 

|A-arI|. |B-a:I|(B-a:I)*(A-a:I)* 

= |A-a:I|. |B-a:I|{(A-a;I)(B-a:I)}*. 
Now there exists a number such that, for all x > x^, 

|A—a:!] ^ 0, |B—a;I| ^ 0. 

It follows that 

(B-a:I)*(A-a;I)* = {(A-a:I)(B-a:I)}* (a; > x^). 

Comparing the matrices on the two sides element by element and 
using the corollary to Theorem 1.6.3 we complete the proof of 
Theorem 3.6.4. 

3.6. Inverse matrices 

3.6.1. In § 3.3 we dealt with addition, subtraction, and multi¬ 
plication of matrices. We shall next investigate to what extent the 



INVERSE MATRICES 


91 


HI. § 3.6 

laws of division carry over from number algebra to matrix algebra. 
It will be recalled that we introduced division in number algebra by 
considering the equation aa; = 1. An analogous procedure is now 
to be followed for matrices. Unless the contrary is stated it is 
again assumed that all matrices are of type nxn. 

Theorem 3.6.1. (i) f/ |A| 0, then the matrix equations 

AX == I, (3.6.1) 

XA = I (3.6.2) 

are both uniquely soluble for X and possess the common solution 


X = ^A.. (3.6.3) 

(ii) If at least one of the equations (3.6.1), (3.6.2) is soluble for X, 
then |A( 0, and so both equations are soluble and possess the 

common unique solution (3.6.3). 

If |A| ^ 0, then, by virtue of Theorem 3.5.2, the matrix X given 
by (3.6.3) satisfies AX == XA = I, and is therefore a solution of 
both (3.6.1) and (3.6.2). To show that this solution is unique, 
suppose that Y also satisfies (3.6.1), i.e. AY = I. Then 

X = X(AY) = (XA)Y = lY = Y. 


Hence the solution of (3.6.1) is unique; and a similar argument 
applies to (3.6.2). 

To prove (ii) suppose, for instance, that (3.6.1) is soluble. Then, 
by Theorem 3.5.1, |A|. |X| = |I| = 1, and so |A| ^ 0. 

Definition 3.6.1. If |A| 0, then the common unique solution 

of equations (3.6.1) and (3.6.2) of the previous theorem is called the 
INVERSE (matrix) of A, and is denoted by A”^. The operation of 
obtaining from A is knoum as inversion. 


Consider, for instance, the matrix 



where jA] = ad--bc ^ 0. Clearly 



and so 


/ dl{ad—bc) —bj(qd—bc)\ 
cl {ad—be) aj{ad—bc) j 



92 THE ALGEBRA OF MATRICES HI, § 3.6 

If A is not a square matrix, or if it is a singular square matrix, 
then A“^ is not defined. 

Both terminology and notation relating to inverse matrices are 
chosen by analogy with elementary algebra, where with any 
number a 0 is associated a unique inverse number having 
the property that =1. As we have seen, every non¬ 

singular square matrix A possesses a unique inverse such that 

AA“i = A~iA = I. 

Exercise 3.6.1. Show that 1“^ = I. Also find the inverse of dg(«!,...,«„), 
assuming that oii # 0,...,a„ ^ 0. 

The introduction of inverse matrices enables us to recognize a 
result anticipated a little earlier, namely, the analogy between the 
behaviour of scalar matrices and that of scalars. We have, in fact, 

al.jSI = (ajS)!, 

(ot 0). 

Thus there is a complete correspondence between the scalars and 
the scalar matrices, each scalar a being associated with the matrix 
al. It is this correspondence which motivates the use of the term 
‘scalar matrix’. 

In elementary algebra one of the first problems we deal with is 
the solution of the equation ax = b. The corresponding problem 
in matrix algebra will now be considered. 

Theorem 3 .6.2. (i) If A is a non-singular mxm matrix and B is 
an mxn matrix^ then the matrix equation 

AX == B (3.6.4) 

possesses the unique solution 

X == A-iB. (3.6.6) 

(ii) If G is a non-singular nxn matrix and D is an mxn matrix^ 
then the matrix equation 

YC==D 

possesses the unique solution 

Y = DC-1. 

It is sufficient to consider (i) since (ii) is proved similarly. If the 
equation (3.6.4) possesses a solution, then, premultiplying both 
sides by A-i, we obtain (3.6.6). Thus a solution, if it exists, is 



INVERSE MATRICES 


93 


III,§ 3.6 

unique and is given by (3.6.5). Moreover, X as given by (3.6.5), 
is clearly a solution of (3.6.4). 

Theorem 3.6.3. (i) If A is a non-singular matrix, then so is 
and (A“^)“^ = A. Moreover |A“^| = |A|“^. 

(ii) If A, B are non-singular matrices, then so is AB; and 

(AB)-i = B-iA-i. (3.6.6) 

If A is non-singular, then A-^ exists and 

A-iA = I (3.6.7) 

Hence the equation A“^X = I is soluble and has X = A as a 
solution. In view of Theorem 3.6.1 it therefore follows that A“^ is 
non-singular and that the above equation has the unique solution 
X = (A”^)“^. Hence (A-^)-^ = A. Moreover, by (3.6.7) and 
Theorem 3.5.1, |A“^|. [Aj = 1, so that |A-^| = |A1“^. 

Again, if A, B are non-singular, then, by Theorem 3.5.1, AB is 
non-singular; and we have 

(AB)(B-iA-i) = A(BB-i)A-i = AA-i = I. 

Hence the equation (AB)X = I has a solution X = B-^A”^. But, 
by Theorem 3.6.1, the unique solution of this equation is given by 
X = and (3.6.6) therefore follows. We may note in 

passing the analogy between (3.6.6) and Theorem 3.5.4. 

Exercise 3.6.2. Show that, if A, B,..., K are non-singular, then so is 
their product AB...K; and (AB,..K)“i = K“^...B“iA“^. 


Definition 3.6.2. If A is non-singular and r is a positive integer, 
then A-’•=(A-l)^ 

We can now extend the index laws, established in § 3.3 for non¬ 
negative indices, to all integral indices. 

Theorem 3.6.4. For all integral values of r, s 

ArA? = A’‘+», (3.6.8) 

(A’-)» = A", (3.6.9) 

provided only that the matrices in question are defined. 

If r is a positive integer, then, by Exercise 3.6.2, 

• (AA...A)-i = A-1...A-1A-1, 

where each product comprises r factors. Thus 

(A^)-i = (A-l)^ 



94 


THE ALGEBRA OP MATRICES 


in, § 3.6 


and so, by Definition 3.6.2, 

A-*- = (A-i)' = (A*-)-! (r > 0). (3.6.10) 

We first prove (3.6.8). This is already known to hold for r ^ 0, 
a > 0. If both r and s are negative, put r — —p, a = —a. Then, 
by (3.6.10), 

A'A* = A-pA-” = (A-i)p(A-i)'' = (A-i)/’+<' = A-P-<’ = A'’+». 

If just one of r, a is negative, let, for instance, r > 0, a < 0 and 
write a = —a. Then, by (3.6.10), 

A''A» = A'-A-" = A’‘(A-i)«'. (3.6.11) 

Now.ifr ^ cr, then, since AA~^ = I, the right-hand side of (3.6.11) 
is equal to A''"*' = A’‘+®. If, on the other hand, r < <r, then the 
right-hand side of (3.6.11) is equal to 
(A-l)<r-r = \r-a = 

Next, consider (3.6.9). This is already known to hold for r > 0, 
a ^ 0. When r ^ 0, a < 0, write a = —a. Then, by (3.6.10), 

(A’’)® = (AO"" = {(A*’)"}-! = (A''®)-! = A-" = A”. 

Again, let r < 0, a ^ 0. Writing r — —p and using (3.6.10) we 
obtain ^ ^ ((A/>)-i}® = (Ap)-® ; 

hence, by the previous case, 

(A*-)® = A-P® = A". 

Finally, let r < 0, a < 0 and write r = —p, s — —a. Then, by 
(3.6.10), 

(A*-)-! = (A-P)-i = {(Ap)-i}-i = Ap = A-^ 
and therefore 

(A*")® = (A’-)-*' = {(A’’)-!}" = (A-^)". 

But, by the previous case, the expression on the right is equal 
to A“®" = A™, and the proof is therefore complete. 

3.6.2. One of the most far-reaching results in elementary 
algebra is the division law, which states that if a6 = 0, then 
a = 0or6 = 0. It is easy to see that this does not apply to matrices. 

Thbobem 3.6.6. The division law is not valid in matrix algebra, 
i.e. the equation AB = O, where A, B are rectangular matrices, does 
not imply that A = O or B = O.f 

t E6bch symbol O must, of course, be interpreted as a zero matrix of the appro¬ 
priate type. 



INVERSE MATRICES 


95 


in, § 3.6 

Take, for instance, 



Then A ^ O, B O, but AB = Of. 

Though the division law does not hold for matrices, there is a 
modified form which is still valid. 

Theorem 3.6.6. If A, B are square matrices and AB = O, then 
A = O, or B = O, or A and B are both singular. 

For suppose that \ ^ ^ O. If at least one of A, B is non¬ 

singular, let, for instance, |A| ^ 0. Then AB = O implies 
A“"^AB = O, i.e. B = O, and we have a contradiction. 

Definition 3.6.3. A rectangular matrix A is a divisor of zero if 
A 7 ^ O and if there exists a matrix B 7 ^ O such that AB = O or a 
matrix C 7 ^ O such that CA = O. 

Theorem 3.6.5 asserts the existence of divisors of zero in matrix 
algebra. For instance, the matrices A and B which appear in the 
proof of that theorem are both divisors of zero. 

In elementary algebra an important consequence of the division 
law is the cancellation law which is deduced as follows. Let a 7 ^: 0 
and suppose that ax — ay. Then a{x—y) = 0 and hence, by the 
division law, x—y — 0 , i.e. x ^ y. An analagous argument cannot 
be applied to matrices, for if A 7 ^: O and AX = AY, then 
A(X—Y) = O, but we cannot infer from this that X—Y = O, 
since A may be a divisor of zero. In fact, it is easy to write down 
examples which show that the relations AX = AY, A 7 ^: O, may 
be compatible with X 7 ^: Y. One such example is provided by the 
matrices 



Similarly, we can find matrices which satisfy the relations 
XB = YB,B 7 ^: 0,X 7 ^: Y, We thus arrive at the following result. 

• Theorem 3.6.7. The relations AX = AY, A 7^ O, where A, X, Y 
are rectangular matrices, do not imply that X = Y. Similarly, the 
relations XB = YB, B 7 ^ O, where B is also rectangular, do not 
imply that X = Y. 



96 THE ALGEBRA OF MATRICES IH, § 3.6 

Exebcise 3.6.3. Let A, B be non-singular square matrices. Show that 
either of the relations AX = AY, XB = YB implies X = Y. 

The problem of determining which matrices are divisors of zero 
now naturally presents itself. We shall solve it here for the case of 
square matrices and deal with the general case in § 5.5.2. 

Theorem 3.6.8. A {non-zero) square matrix is a divisor of zero if 
and only if it is singular. 

This result is almost obvious. For if A is a non-singular matrix 
and AX = O (YA = O), then, premultiplying (postmultiplying) 
by A“^ we obtain X = O (Y = O). Hence a non-singular matrix 
is not a divisor of zero. On the other hand, if A is singular, there 
exists (by Theorem 1 . 6 . 1 , p. 27) a vector x 0 such that Ax = 0. 
Hence A is then a divisor of zero. 

It is, in fact easy to obtain the stronger result that if A is singular, 
then there exist square matrices X, Y such that AX = O, YA = O. 
For suppose A is of type nxn and denote by x^,...,x^ any vectors 
(of order n), not all zero, such that Ax^ ==... = Ax^^ = 0. Let the 
nxn matrix X be defined by the relations X,„^* = x^ (i = l,...,n). 
Then X 91 ^: O and, by Theorem 3.3.5 (ii), p. 84, 

(AX),i({ = AX,ij^ = AX{ = 0 (i ~ I,...,?!-). 

Hence AX — O, as required. Again, in view of Theorem 1 . 6 . 1 , 
there exist scalars not all zero, such that (^i,...,f„)A = 0 . 

Denote by Yi,.-? Yn any n row vectors, not all zero, satisfying 
YiA = ... = y„A = 0, and let Y by the nxn matrix defined by. 
the relations — Yiii = l,—,^)* Then Y 9 ^ O and, by Theorem 
3.3.5 (i), 

(YA)^,it = Y^,uA = y^A = 0 (i = lv>^)> 
so that YA = O. 

We may summarize the conclusions reached so far about matrix 
algebra by saying that in most respects it resembles the algebra of 
numbers, with the unit matrix and the zero matrix playing the 
parts of the numbers 1 and 0 respectively. The vital points on which 
the two algebras differ is that matrix multiplication is non- 
commutative and that the division law (and consequently thp 
cancellation law) is not valid for matrices. 

3.6.3. We conclude the present section by deriving some 
identities relating to transposition. 



INVERSE MATRICES 


97 


m, § 3.6 

It is, in the first place, obvious that 

(ocA)'^ = (A+B)^ = A^+B^. 

A more interesting identity concerns the transpose of a product. 
Theorem 3.6.9. If A, B are rectangular matrices, then 

(AB)^ = B^A^, (3.6.12) 

provided that either side of the equation is defined. 

We note the analogy between this result. Theorem 3.5.4, and 
equation (3.6.6). 

Let A, B be of type pxq,rxs respectively. Then (AB)^ exists 
if and only ifq = r, and is then of type s xp. The same is true of 
B^A^; and therefore, if either side of (3.6.12) exists, so does the 
other side and both sides are of the same type. To prove the actual 
equality it now suffices to note that 
{(AB)% = (AB),., == J A,,B;,, = J (B^),,(Ai’),^ = (B^Ar)«. 

Exeiicise 3.6.4. Show that (AB...K)^ =■ K^...B^A^, provided that 
either side is defined. 

A useful deduction from Theorem 3.6.9 is as follows. 

Theorem 3.6.10. If A is a non-singular square matrix, then 
(A^)~i = (A-i)^. 

In other words, the operations of transposition and inversion 
can be carried out in either order. 

To prove the theorem we put B = A-^ in (3.6.12) and obtain 

(AA-i)^ = (A-i)^A^, 
i.e. (A-i)^A^ = I. 

The assertion now follows immediately. 

3.7. Rational functions of a square matrix 

We have already considered positive and negative powers of 
matrices. We shall now extend this idea still further. 

Definition 3.7.1. Let f{x) he a polynomial in the scalar variable 
say ^ 

If A is a square matrix, of order n, then f {A) is defined as the (square) 
matrix given by the relation 

/(A) == CoI„+CiA+C2A2+...+C4.A^ 

H 


6682 


(3.7.1) 



98 THE ALGEBKA OF MATRICES III, § 3.7 

An expression of the type (3.7.1) is called a matrix polynomial 
or polynomial in a matrix. By contrast, a polynomial in a scalar 
variable is known as a scalar polynomial. 

We recall that two scalar polynomials /(a;), g{x) are said to be 
identically equal if f[x) = g{x) for all values of x. 

Theorem 3.7.1. If the polynmiials f{x), g(x) are identically 
equal, then, for any square matrix A, /(A) = ^^(A). 

This result is obvious since f{x) = g[x) implies, as we know, that 
the coefficients of like powers oixinf and g are equal. The con¬ 
clusion therefore follows. 

It is not difficult to extend the scope of Theorem 3.7.1. 

Theorem 3.7.2. Any polynomial identity between scalar poly¬ 
nomials remains valid for the corresponding matrix polynomials. 

Thus, for instance, iif^,...,f^ are polynomials, and 

= h{x)-U{x), 

then, for every square matrix A, 

{A(A)A{A)+/3(A)}/,(A) = A(A)-/e(A). 

In view of Theorem 3.7.1 the proof of Theorem 3.7.2 will be 
complete if we can show that 

f{x)-\-g{x) = <j>{x) implies /(A)+gr(A) == ^(A), (3.7.2) 

f(x)g{x) = <j>{x) implies f{A)g{A) = <f>(A). (3.7.3) 

The proof of (3.7.2) is trivial and we leave it to the reader. To 
establish (3.7.3) we write 

/(a:) = 2®i**. = ^(a:) = 2 

< = 0 i = 0 A; = 0 

Since f{x)g(x) = ^(a:), the coefficients of like powers of x may be 
equated; hence r+s = t, and 

Cfc == ^ (* = 0, 1 ,-. 0 . (3.7.4) 

where the summation extends over all pairs of integeis i,j satisfy¬ 
ing the conditions 

= (3.7.6) 



m,§3.7 RATIONAL FUNCTIONS OF A SQUAKE MATRIX 99 

Next, using the distributive law for matrices and identity (3.6.8) 
(p. 93), we obtain 

/(AMA)= (i:«,A«)( j:6,A<) =2 .i«,A‘.6,A< 

^1=0 ^ 'j=0 1 = 0 j — O 

<=o i=o *=o 

where — 1.—.»■+«). 

ij 

with the summation extending over all pairs of integers i, j satisfy¬ 
ing (3.7.6). Hence, by virtue of (3.7.4), = c*, and so 

f{A)g(A) = i>(A). 


Since, for any two polynomials f(x), g{x), we have 
f{x)g{x) = g{x)f{x) 

it now follows that/(A)gf(A) = gr(A)/(A). We have therefore the 
following corollary. 

CoROLLABY. Any two polynomials in A commute with each other. 

Again, letf{x), g{x) be two scalar polynomials and suppose that 
the square matrix A satisfies the relation |0^(A) | 0. We have just 

seen that f{A)g(A) = g(A)f(A). (3.7.6) 


Hence, premultiplying and postmultiplying by {gr(A)}“^, we obtain 


{^(A)}-V(A) = f{A){g(A)}-\ (3.7.7) 


This identity enables us to define rational functions of A. 


Definition 3.7.2. Letf(x), g[x) he scalar polynomials and let A 
he any square matrix such that |g^(A)| ^ 0. Then the matrix appear¬ 
ing on either side of (3.7.7) is known as the quotient of /(A) hy 

g[A), and is denoted hy or f {A)lg{A). 

9(A) 


A quotient of two polynomials in A is called a rational function 
of A. If 9r(a;) is identically equal to 1, then g{A) — I, and it follows 
that any pol3momial in A is a special rational function of A. 


^ Exercise 3.7.1. If/, g, 0, ip are polynomials, show that 

f{^)lQ(x) = (piocVpix) implies f(A)lg(A) = ^(A)/0(A) 
for any square matrix A such that \g{A)\ ^ 0, |^(A)| # 0. 

Theorem 3.7.2 can now be extended to rational functions. 



100 


THE ALGEBRA OF MATRICES 


III, § 3.7 


Theorem 3.7.3. Any identity between scalar rational functions 
remains valid for the corresponding rational functions of a square 
matrix^ provided that all the latter functions are defined. 

It is clearly sufficient to establish the following results. 


/i(^) I hix) _h{x) 

fi(x) fzi^) _h{x) 

fi(x) IMx) _Mx) 

fzi^)/ U^) U(^) 


implies 


A(A) / 3 (A) ^ A(A) 
MA)'^UA) UA)’ 


implies 


/i(A) MA)^UA) 
MAyUAj /e(A)’ 


implies 


A(A)//3(A)^A(A) 

A(A)/A(A) /e(A)- 


(3.7.8) 


Here all the A are polynomials and it is assumed that the rational 
functions of A involved in each case are defined. Thus, in (3.7.8) it 
is assumed that |A(A.)| ^ 0, |/4(A)| ^ 0, |A(A)| 7*^ 0. We shall 
give a proof of (3.7.8) and leave the proofs of the remaining two 
statements to the reader. 

The identity on the left-hand side of (3.7.8) implies 




Hence, by Theorem 3.7.2, 


/i(A)A(A)/e(A)+/3(A)/,(A)/e(A) = MA)MA)UA). 
Premultiplying both sides by {/2(A)}-i{/4(A)}-i{/j(A)}-i and 
making use of (3.7.6) and (3.7.7), we obtain 

A(A){A(A)}-i+/3 (A){/4 (A)}-i = A(A){/e(A)}-i. 

The proof of (3.7.8) is therefore complete. 


Theorem 3.7.3 shows that rational functions of matrices can be 
manipulated in accordance with the same formal rules that are 
used for the rational functions of a scalar variable. It might now be 
concluded that the theory of rational functions of matrices is 
virtually the same as that of scalar rational functions. In one 
important respect, however, the analogy breaks down completely 
for, as we shall see in § 7.4, every rational function of A is equal to 
a polynomial in A —a result which has, of course, no analogue in 
the theory of scalar rational functions. 

» 

3.8. Partitioned matrices 

In the present section we shall introduce a very useful technical 
device which will frequently facilitate the manipulation of matrices. 



Ill, §3.8 PARTITIONED MATRICES 101 

The results to be established below are almost obvious intuitively, 
even though the details of the formal argument may be found 
rather tedious. 

Consider a rectangular matrix A, and let a number of lines be 
drawn between its rows, or columns, or both. These lines will then 
partition A into a number of smaller arrays. Thus, if A is the 
4x3 matrix 



%2 

^ 13 ^ 

®21 

®22 

^23 


^32 

®33 


^42 

^ 43 ) 


three of the possible ways of partitioning it are as follows: 



®12 

ai 3 

; ( ii ) 


®12 

«13^ 

®21 

«22 

®23 


®21 

®22 

®23 

^31 

^32 

^33 



^32 

^33 

1^41 

^42 

®43 j 


i ®41 

®42 

^43; 


(iii) 


®12 

«13' 


^21 

^22 

»23 


®31 

^32 

®33 


k ^41 

«42 

^43> 


A matrix devided in some such way by horizontal or vertical lines 
is called a partitioned matrix. If ^ stands for the mode of partition¬ 
ing of the original matrix A, then the resulting partitioned matrix 
will be denoted by A^. If ^ consists of all lines drawn between 
every pair of consecutive rows and every pair of consecutive 
columns of A, then A^j is, of course, indistinguishable from A. 

We can represent a partitioned matrix most economically by 
denoting each constituent array by a single matrix symbol. Thus 
in the case (i) above we can write 

. _ /A(ii> A(i2)\ 


where 



102 


THE ALGEBRA OF MATRICES 


III,§ 3.8 


In the case (ii) we have 


where 

A<“) = (an On), 

Aa*)=(an). 


/A(ii) A<i2)\ 

A^ = (a»w aw , 
\aw A(*»/ 


A(®^> — (on O22), 


AW = (023), 



Again, in (iii), 
where 


Adi) = 





/ AW \ 



A(p — 

\ Awj ’ 

'^11 

®12 

® 13 \ 


^21 

®22 

® 23j > 

A <2 i ) = ( o « 

<^31 

%2 

<* 33 / 



®42 ^43)* 


The constituent matrices of a partitioned matrix will be called 
its elements. It must, however, be borne in mind that an arbitrary 
rectangular array of matrices is not, in general, a partitioned matrix 
since when the constituent matrices are written out in full the 
resulting array of numbers need not, of course, be a rectangular 
array. 

If r is any partitioned matrix we shall denote by {F} the ordinary 
matrix obtained from F by removing its partitions. Obviously 
{A^} = A. 

We can at once extend the definitions of addition and multiplica¬ 
tion to partitioned matrices simply by replacing the operations 
carried out on numerical elements by the corresponding operations 
carried out on the matrix elements. Addition and multiplication 
so defined wdll be found to obey the same formal laws as those 
established for ordinary matrices. What is equally important is 
that in forming sums or products of partitioned matrices we arrive 
at precisely the same results we should have obtained if the 
partitions had been removed initially, f It is this last fact that 
makes partitioning so convenient a device. 

Let us first consider addition. Here the discussion is almost 
trivial. 


t To be more precise, the two results are the same apart from the partitions. 
This qualification must be added wherever the context requires it. 



PARTITIONED MATRICES 


103 


III,§ 3.8 


Definition 3.8.1. Let A, B be matrices of the same type and let 
the same partitioning ^ be applied to them^ so that 



Then addition is defined by the equation^ 

( AdD+B^ii) . . . Ad3)+Bd5) 

. 

Ad>i)-|-Bd>i) . . . 

We note that this definition reduces to Definition 3.3.2 (p. 78) 
for the case of ordinary matrices. 

It is at once clear that 


= (A+B)^; 

in other words {A^+B^} = A+B. 

Thus, no matter whether we add two matrices or their partitioned 
forms, we obtain the same result. 

It is equally clear that addition of partitioned matrices is 
commutative and associative, i.e. 

A^+B^ = B^^+A^||, 

A^+(Bs^+C^) = (A^+Bs^)+C^* 

The discussion of multiplication requires a little more care. 
Definition 3.8.2. Let A, B be two matrices for which AB is 
defined, and suppose that partitions Q are applied to A, B respec¬ 

tively, so that 


/A(i« . 

. . A»«)\ 



. . B(i»)\ 


. . AtPay 

Bq= 1 


. . B<™V 


If q =z r and if every matrix product 

AdA:)B(fci) = 1,...,^; j k = l,...,g) 

is defined, then the product A^ B^ is defined as the partitioned matrix 
of,p rows and s columns whose {i,j)th element is the matrix 

k^l 

t It is obvious that and are of the same typo. 

* { It should, in the first place, be noted that the q matrices {k — 1,..., q) 

are all of the same type, so that they can be added together. Furthermore, any 
two elements of the matrix A^ Bq are matrices with the same number of rows 
(columns) if they stand in the same column (row) of A^Bq. Hence A^Bq is a 
partitioned matrix in the sense in which we use this term. 




104 THE ALGEBRA OF MATRICES III, § 3.8 

This definition reduces to Definition 3.3.3 (p. 80) for the case of 
ordinary matrices. 

Theorem 3.8.1. If the product is defined, then 

{A^ Bq} AB. 

This result states, in effect, that partitioning of matrices does 
not affect their multiplication. It is extremely useful in certain 
cases where it is found to be more economical to multiply matrices 
in their partitioned form.f 

Theorem 3.8.1. is almost obvious intuitively, and before we give 
a formal proof we shall illustrate it by a simple example. 

Let A, B be two matrices of type 4x4, 4x2 respectively, and 
let them be partitioned as follows: 


A^ — 



^12 

®13 

Ol4 

®21 

®22 

®23 

"24 

C*31 

®32 

^33 

^34 

VO41 

“42 

®43 

®44^ 


/Adi) Ad2)\ 
\A^2i) a(22)J’ 


Ba = 


/611 

612 

621 

^22 

^31 

^32 

W4I 

642/ 

A^ 

Bois 


\bH- 


A^ — 


/A<11)B<11)+A<12)B(21)\ _ /Cdi)\ 
\A<2i)Bdi)+A(22)B(2i)j \Cdi)j’ 


say. 


Here we have 


on. = Aa..B.n,+A<«B», = (^. + (“ j(6„ 

\^31 ^ 32 / 

(: 


_ ^21“t”^13 ^31^"%4 ^41 ^124'^12 ^22“f“^13 ^32~i"^14 ^42\ . 

W2I ^ll"t"®22 ^21“1"®23 ^3 i 4"®24^41 ^21 ^124"®22 ^22“1"^23 ^32"i“^24 ^42/ * 

C(21) A(21)Bdl)-f A<22)B(21) = % 

\a41 »42 



(641 642) 


/®31 ^11"1~®32 ^21"1“^33 ^3i4“%4 ^41. ®31 ^12“1~%2 ^22~1“^33 ^324"^34 ^42\ 
1^41 ^ 11 “ 1“^42 ^ 2 i 4”®43 ^ 31+^44 ^41 ^41 ^ 12+^42 ^ 22+^43 ^ 32+^44 ^ 42 / 


t See, for instance, the proofs of Theorems 6.6.2., 6.4.1, and 10.3.4. 



PARTITIONED MATRICES 


105 


III, § 3.8 
Therefore 


A<p Bq 

^11 “^^12 ^21“ t ”^13 ^41 ^12“ t ~^12 ^22" t "^13 ^32'^”^14 ^42^ 

_ ^21 ^11 "^"^22 ^ 21~^~^23 ^ 31 ~ f '^24 ^41 ^21 ^ 12 ~ l ~^22 ^ 22“^"^23 ^ 32~^~^24 ^42 

®31 ^ll~f"^32 ^21“i"%3 ^31~t“%4 ^41 %1 ^12"f"%2 ^22~t~%3 ^32“1“^34 ^42 
1^41 ^ 11+^42 ^ 21+^43 ^31 + ^ 44^41 ®41 ^ 124"®42 ^ 22+®43 ^ 32+^44 ^ 42 > 

and so {A^ Bq} — AB, as stated by the theorem. 

To prove the assertion in general wo shall, for the sake of brevity, 
write r = A^j B^. We note, in the first place, that {F} and AB are 
matrices of the same type. 

Next, consider a typical element (AB)^.y of AB. Suppose that the 
ith row of A is made up of the Ath rows of the matrices 

A(“i),...,A(“3), 


and that the jth column of B is made up of the /zth columns of the 
matrices ..., 


Let denote the (u, «;)th element of F: Then, by Definition 3.8.2, 


Y(uv) ^ ^ \(uk)-g(kv)^ 


k^l 


and sot ■=^ 


= i (A(«^)),,(B(^^)),^ 

k^l 

= (AB),,.. (3.8.1) 

Hence (AB),^ is an element of {F}. 

The matrices {F} and AB are thus of the same type and have the 
same elements. It remains now only to prove that the elements in 
the two matrices occur in the same order. Suppose that, in addition 
to (3.8.1), we also have 

= (AB),,,. 

It is clear that if i' ^ i, then either u' > u ov u' = u, A' ^ A. 
Similarly, if j' ^ j, then either v' > v or v' = v, ja' ^ p,. This 
means, in fact, that _ (AB)^;. 

JThe proof of the theorem is therefore complete. 

t It is essential to bo quit© clear about the precise significance of each symbol. 
We repeat, then, that T is the partitioned matrix its {u, v)th element 

is, of course, itself a matrix. Again denotes the (A,/ti)th element of 

; this is therefore an element of {F} and so is a scalar. 


106 


THE ALGEBRA OF MATRICES III, § 3.8 

In conclusion it should be mentioned that multiplication of 
partitioned matrices is associative and also distributive with 
respect to addition, i.e. 

= (A^B£|)C9i, 

(B£|+C5i)A^ = BqA^+CjjA^, 

provided that the matrices in question are defined. Multiplication 
of partitioned matrices is obviously non-commutative. 


PROBLEMS ON CHAPTER III 

1 . Compute the matrix products 


«' (1 ;) 0 ^ U 0 


1 -1 1 

0 0 2 

-2 -1 0 , 


-1/V2 1/V2- 

2. Find AB and BA when 

A.(l 

(ii) A = (_ 

In case (ii) find a non-zero matrix C such that BG = O. 

3. If 


2 

i\ 

/O 

0 

1 \ 

3 

2 . 

O 

II 

1 

0 

4 

6 / 


0 

0 / 

1 

i 

;)• 

B-C. 

— 

:’)■ 


/O 1 0\ 

U --= 0 0 1), 

\o 0 0/ 


find UUr, U^’U, U*, 

4. Find A®, A®, and A‘ when 


(i) A 


= 11 W OJ 

\l OJ 


(co = ; (ii) A 


1 

i 

-1 


5. Find A*, given that every element of the nxn matrix A is equal 1 

6 . Show that, if A^ = A, then (A*)^ = A*. 

7. If /I fi- n.^\ 


11 a a*\ 
= 1 b 6M, 
\l c cV 


/—be —ca 

—ab\ 

jb—c 

0 

0 \ 

P* = Ib+c c+a 

0 + 6 ) 

I ® 

c—a 

0 

\-i -1 

- 1 / 

\ 0 

0 

0 - 6 / 


show that 



PROBLEMS ON CHAPTER III 


107 


8 . Show that the product of two lower triangular matrices is again a 
lower triangular matrix. Also show that the inverse of a non-singular lower 
triangular matrix is a lower triangular matrix. 

Find the inverse matrix of 


/I 

0 

0 


jl 

2 

0 

o\ 

\2 

1 

3 

o) 

\l 

2 

1 

4/ 


1 


(O 


O)® 

OJ 1 

'cosd 

— sin 

^sin $ 

cos^ 


9 . Find the inverse matrix of 


where w = 

10. Let 


Show that A(0)A(0') == A(0 + 0')> give a geometrical interpretation of 

this result. 

11. A is a given square matrix. Writing A^ = (H—A)"^ whenever 
H-A is non-singular, prove that 

(t—u)AiAu A^—A^. 

12. Determine the condition for the matrix 

( 1 —n m \ 

n 1 -n 

—m I 1 / 


to be non-singular; and in that case finds its inverse. 

13. Show that if B commutes with A, then it commutes with every 
polynomial in A. 

14. Show that if A and B commute, so do A^ and B^ where k,l are 
positive integers. Discuss the case when k,l are not necessarily positive. 

16. Show that, if P”^AP = Q~^BQ, then there exist matrices R, S such 
that A = RS, B - SR. 

16. Let A = (a„) be an n xn matrix and write A = Show that, if 

X = then 

xi'A^’Ax = i I i aicr^\ . 

L7. A (square) matrix A is said to be idevipotent if A* = A. Show that 
the sum of two idempotent matrices A and B is idempotent if and only if 

AB = BA = O. 

18. Show that if the product of two matrices is equal to a non-zero scalar 
matrix, then the two matrices commute. Can the term ‘non-zero’ be 
omitted in this statement ? 

19. Let A be a fixed m X m matrix and B a variable m x n matrix. Show 
that the equation AX = B in X is soluble uniquely for every choice of B if 
and only if |A1 ^ 0. 



108 THE ALGEBRA OF MATRICES 

20 , Find all matrices which commute with the matrix 


III 


/A 1 
0 A 
\o 0 


?V 


A. 


21 . Show that any two matrices which commute with 



:) 


, commute 


with each other. 

22. Show that a necessary and sufficient condition for an nxn matrix 
to commute with evtry nXn matrix is that it should be scalar. 

23. Express the matrix 

/13 7 3\ 

A= 7 5 4 

1 3 4 6 / 


in the form A = where B is an upper triangular matrix with positive 

elements. 

24. Let the vectors Xi,...,x„, yi,.-,y», of order n, be connected by the 
equations „ 

y, = S a„x, (r = 

S==l 

Show that Y = AX, where A = (a,.,), and X and Y are the matrices having 
Xi,...,x„ and yi,.*->yn respectively as their rows. 

26. Let A = where a 1 . Show that, for n > 0, 

\0 1 / 


whore = o’*, l)/(o— 1 ), and discuss the validity of this 

formula for n < 0 . Also consider the case a = 1 , 

26. Let A — (a^g) be a rectangular matrix, and write 

mA) = j s |a„pj^ 

Show that JV(AB) < jV'(A)iV(B), and deduce Theorem 2.5.2 (p. 63). 

27. Defining N(A) as in the preceding question, prove, by means of 
Minkowski’s inequality,f that iV(A-}-B) < A^(A)H-.^(B). Also show that 
this relation is equivalent to the triangle inequality for vectors (Theorem 
2.6.3, p. 64). 

28. By considering powers of the nxn matrix 


/o 

1 

0 

0 . . 

. 0\ 

0 

0 

1 

0 . . 

. 0 

0 

0 

0 

1 . . 

. 0 

0 

0 

0 

0 . . 

. 1 

\0 

0 

0 

0 . . 

. 0/ 


t See Hardy, Littlewood, and Pdlya, Ineqibalities^ 31. 






in 

show that 


PROBLEMS ON CHAPTER III 


109 



1 

0 

0 . . 

. 0> 

N 



0 

A 

1 

0 . . 

. 0 




0 

0 

A 

1 . . 

. 0 




0 

0 

0 

A . . 

. 0 




.0 

0 

0 

0 . . 

i A; 






V 


© 

XN-2 

{>- ■ ■ 




0 



{>- ■ ■ 




0 

0 

A*^ 

0- • ■ 




^0 

0 

0 

0 . . 



where ^ ^ j interpreted as 0 when k > N, 

29. Let f{t)f g(t) be polynomials whoso highest common factor is 1 ; let 
M bo an nxn matrix; and write A ~/{M), B = g(M)- Show that every 
solution of the equation ABx = 0 can bo written in the form x = y+z, 
where By = Az = 0. 

30. Show that the sum and the product of two matrices of the type 
( 6 a) niatricos of the same typo. Deduce that the product of 

two matrices of the type 



is again a matrix of this type. 

31. U is a subspace of (n > 2 ), which consists of all complex vectors x 
whoso last n—2 components all vanish; and A is a complex nxn matrix such 
that I Ax I = A;|x| for all x 6 H, where A; is a constant. Prove that the first 
two columns of A are orthogonal to each other and have length k. 

32. Referred to a system of rectangular cartesian coordinates 

OXi, 0 X 29 0 X 3 , the coordinates of a point X are given by the column 
vector x, where x^ = (a:i,a; 2 »^ 8 )* The direction cosines of a given line I 
through O are given by the column vector 1, where l^’ = (^i, ^ 2 > ^ 3 )* Prove that 
the foot of the perpendicular from X to Z is the point Ax, where A = 11^; 
and show that |^| _ ^ 


33. Find all 2 x 2 matrices which satisfy the equation 
that all but one of these matrices are singular. 


: X, and show 








110 THE ALGEBRA OF MATRICES HI 

34. Let be numbers such that = L and put 

af-f-a? aiflta . . . «iOn\ 

OaOi aj+ic • • • <*2®n I 

a^ai 0^02 . . . «J+a:/ 

Show that, for every value of x, AJ = — 

35. Let be distinct numbers and arbitrary numbers. 

Prove that there exists one and only one polynomial / of degree < A ;—1 
such that f{o}i) = {i — 

Show that, for any square matrix A and any distinct numbers 

A— ^ j 

0)^ — 0)^/ 

Kjti 

36. A, B, G, D are nxn matrices, A being non-singular and AC = CA. 
Prove that the determinant of the 2n X 2n matrix 



is equal to that of the nxn matrix AD —BC. By replacing A by 
prove that the restriction that A is non-singular can be dropped. 

37. If Eij is the nxn matrix whoso elements all vanish except for a 1 in the 
ith row andyth column, show that = 8jk Eii, 

If 



> 

to 

II 

> 

eo 

II 

P_p 

0 ^ 
0 0 

II 

< 

show that 

4 


where 



. Deduce that this result still holds when A^, Ag, Ag, A 4 

are replaced by arbitrary 2 x 2 matrices. 



38. Let <f)(n) denote the number of integers not exceeding n and prime 
to n, and write 

( 1 . 1 ) ( 1 , 2 ) . . . (l,n) 

(2.1) (2,2) . . . (2,n) 


(n, 1 ) (n, 2 ) . . . (n,n) 

where (r, s) denotes the highest common factor of r and a. By considering the 
matrix product P^4>P, where O = dg{(^(l),<^(2),...,^(n)} and P„ = 1 or 0 
according as r divides or does not divide a, show that 
D = (^(1)^(2)... <^(n). 

39. Let N(n) denote the number of non-singular nXn matrices each of 
whose elements is 0 or 1 . Show that 

N{n) = (22-l)(23-l)...(2«-l)2l»»(«-i). 

Hence find the number of w-rowed determinants whose value is odd and 
each of whose elements is 0 or 1 . 










IV 


LINEAR OPERATORS 

The object of the present chapter (which may be omitted at the 
first reading) is to elaborate the remark about ‘transformation of 
vectors’ made at the end of § 3.4. The discussion in that section 
already pointed to a close relation between matrices and linear 
substitutions or transformations. We now propose to study this 
relation in greater detail and to show how, when we attempt to 
construct an analytical apparatus for handling linear transforma¬ 
tions, we are naturally led to develop the calculus of matrices. In 
particular, we shall discover that, contrary to the impression that 
may have been derived from § 3.4, matrices and linear transforma¬ 
tions cannot be identified. What, in fact, we shall find is that a 
linear transformation can be ‘represented’ by any matrix of an 
infinite class of matrices and that all such ‘representations’ have 
equal status. As it is desirable to frame the argument with the 
greatest possible degree of generality, we shall base our discussion 
on the idea of a linear manifold. 

4.1. Change of basis in a linear manifold 

Throughout the present section we denote by StR a given linear 
manifold, of finite dimensionality n '> 0, over some specified field 
8f. The zero element of is denoted by 0. 

Theorem 4.1.1. Lei S = J^n) ^ basis of 501, and suppose 

that the elements are defined by the equations 

4 = 2 Pij^i iJ = (4.1.1) 

i = l 

Then © = jgfj is a basis of 50i if and only if the matrix 

P = {Pifj is non-singular. 

If are any scalars, then, by (4.1.1), 

i=-i 


(4.1.2) 



112 LINEAR OPERATORS IV, § 4.1 

Suppose, in the first place, that |P( ^ 0. If are scalars 

such that „ 

= 0, (4.1.3) 

then, since are linearly independent, we have by (4.1.2) 

1 Pijtj = 0 (i = !,...,»). (4.1.4) 

i=i 

Hence, by Theorem 1.6,1 (p. 27), — ... = <„ = 0. The elements 

are therefore linearly independent, and S is a basis. 
Next, let |P| = 0. Then there exist scalars not all zero, 

satisfying (4.1.4). In view of (4.1.2) these scalars also satisfy (4.1.3). 
Hence -Sj,..., are linearly dependent and so do not constitute a 
basis. 


We recall from § 2.4 that once a basis in a linear manifold has 
been chosen, the elements of that manifold can be represented by 
vectors. We shall now investigate the relation between any two 
such representations arising from two different choices of basis. 

Theobem 4.1.2. (i) Let ®, S be any two bases in 9Jl. If, for any 
x = .^(Z;»), X = ^(Z;®), (4.1.6) 


then there exists a unique non-singular nxn matrix P, independent 
of X, such that X — Px. (4.1.6) 


(ii) IfP is a given non-singular nxn matrix and 93 is any given 
basis in 501, then a second basis S can be found such that, whenever 
X, S satisfy (4.1.6), they also satisfy (4.1.6). 

This result states, loosely speaking, that when the elements of a 
linear manifold are represented in terms of vectors, a change of 
basis in 501 is appropriately described by a matrix multiplication. 
Moreover, each such multiplication corresponds to a change of 
basis. 

To prove (i), let 5B == S = {JSfi,..., and write 


X (^1)...> ®ii)^» ^ — (^l>...j ^n)^* 

(4.1.7) 

By (4.1.5) we have 


X=fx^Ej= 

(4 l.i<) 


Let Ej — X Pij^i (j = l,.:,n). 

i*®l 

(4.1.9) 



113 


IV, § 4.1 CHANGE OF BASIS IN A LINEAR MANIFOLD 

Then, by Theorem 4.1.1, the matrix P = (p^^) is non-singular. By 
(4.1.8) and (4.1.9) we have 

X = y XjEj — y Xj T PijEi — y Ei y p^^xj, 

f^l j = l 1 1=1 ^ = 1 

and so, since the coordinates of an element with respect to a basis 
are uniquely determined, 

(i 

These relations are, of course, equivalent to (4.1.6). Furthermore, 
if in addition to (4.1.6) we also have x = Qx, then, for all x, 
(P—0)x = 0; and it follows by Exercise 3.3.8 (p. 85) that 

0 = P. 

To prove (ii), let P be a given non-singular nxn matrix and 
write P”^ = {Pij)- Let S = be any given basis in 501, 

and put ^ 

= 2 Pii^i U = i,...,»). 

i = l 

Then, by Theorem 4.1.1, S = is also a basis of 501. If 

X and X are defined by the relations (4.1.5) and if the notation 
(4.1.7) is used again, then 

X- y s,e ,- if, xpiE ,_ i^irtf,, 

3=1 1=1 1=1 J=1 

n 

and so ^ Pij^j 

in other words, x = P”^x, and the proof is complete. 

4.2. Linear operators and their representations 

4.2.1. Let us consider two sets of objects 501 and 501*. 
Definition 4.2.1, If with each object X in is associated a 
unique object L(X) in 501*, then L is called a mapping or a trans¬ 
formation OF 501 INTO 501*, and 501 is said to be mapped into 501* by 
L, Furthermore, for each X e 501, L{X) is called the image of X. 

Other terms such as operator, operation, or function are also used 
to describe L, and the idea they express is an exceedingly common 
one. The term ‘function’ is particularly apt; for L(X), as defined 
above, is in fact an obvious extension of the elementary notion of 
a function. Here the argument X is taken from the set of objects 
501 and the functional value L{X) lies in the set 501*. 

6582 I 



114 LINEAR OPERATORS IV, §4.2 

For example, let 9K be the set of real and 9K* the set of complex 
numbers. The function 

L(X) = 

is then clearly a mapping of 9Jl into 9W*. Or, again, let SH be the set 
of points of a plane and 9Jl* the set of non-negative real numbers. 
If, for any X e SCR, L(X) is defined as the distance of X from some 
fixed point in SDl (the unit of measurement having been chosen), then 
2/ is a mapping of SCR into SCR*. 

Our main concern is, however, with linear manifolds and we 
shall throughout this chapter use the S 3 nnbols 2R, 951* to denote 
two linear manifolds, of finite non-zero dimensionality, over the 
same reference field Until the contrary is stated we shall also 
write d(SCR) = m, d(SCR*) = n. We propose to study a particularly 
simple and interesting class of transformations of one linear mani¬ 
fold into another. 

Definition 4.2.2. Let Lbea mapping of SCR into SCR* and suppose 
that^ for all X, F g SCR and all a e g, 

L{<xX) = olL{XI L{X+Y) = L{X)+L{Y), (4.2.1) 

Then L is said to be linear. 

Similarly we speak of linear trarisformations, or linear operations^ 
or linear operators, and we say that 951 is mapped linearly into 951* 
by L. 

The requirements (4.2.1) are referred to as the conditions of 
linearity, and an immediate consequence of them is that, for all 
G 951 and all g g, 

L{oLiXi-{-„.-\-ocjcXjg) = aiL(Xi)-]r----\-oLj^L(Xjg). (4.2.2) 

Exebcise 4.2.1. Let denote the linear manifold of mx w matrices. 

Show that transposition is a linear mapping of into 

Exercise 4.2.2. Lot L be a linear mapping of 9Jl into 951* and denote 
by 0, 0* the zero elements of 951, SOI* respectively. Show that L(0) = 0*. 

It may be recalled that we have already encountered one 
important instance of a linear mapping. If A is a given nxm matrix 
and X is any vector of order m, then the function L, specified by the 
equation ^ ^ 2 

is a linear mapping of into This fact can, of course, be 
verified immediately. 



IV, § 4.2 LINEAR OPERATORS AND THEIR REPRESENTATIONS 116 

We know (by Theorem 2.4.3, p. 69) that every linear manifold 
of finite dimensionality can be represented by a total vector space. 
Equation (4.2.3) therefore suggests that it may be possible to 
represent every linear mapping v ^ 501 into 501* by means of a suitable 
matrix. This is, indeed, the case as is shown by the next result.f 

Theorem 4.2.1. Let Lhea linmr Tnapping of 501 into 501* and let 
S, S* be any bases of 501, 501* respectively. If, for any X e 501, 

X = ^(Z; ©), X* = M{L(Xy, ©*), 

then there exists a unique nxm matrix A, independent of X, such that 

X* = Ax. (4.2.4) 

Let S = = [EX,..., El). Eachof L(Zi),...,L(ZJ 

is an element of 9Jl* and so is expressible as a linear combination 
of EX,...,E*, say 

E{Ej) = 2 ay BSX {j = 1.m). (4.2.5) 

i = l 

If X = {Xi,...,xJ^, X* = (xX,...,xX)^, then, by hypothesis, 

m n 

X = :2^jEj, L(X) = :ZxfEf. 

J-1 i=l 

Hence, using (4.2.2) and (4.2.5), we obtain 

L(X) = Llf = I XjL(E,) 

'j—1 ' j—i 

m n n m. 

= i *,• 2 ay = 2 EX 2 ay Xy 

j — 1 i=l i~l j-1 

But the expression for L{X) as a linear combination of JS?*,..., JEJ* 
is unique; therefore 

m 

= (» = 1, 

j=l 

and this is equivalent to (4.2.4) with A = (a^^). The uniqueness of 
A is. proved by the same argument as that used in the proof of 
Theorem 4.1.2 (i). 

The theorem just proved shows that every linear operator can be 
represented in terms of a matrix multiplication. The representation 

t Throughout the discussion below we make use of properties of matrices to 
deduce properties of linear operators. For a treatment of linear algebra in which 
the study of operators precedes that of matrices see, for example, Halmos, 14, 
and Liclmerowicz, 17. 



116 LINEAR OPERATORS IV, §4.2 

is not, of course, uniquely determined since it depends on the arbi¬ 
trary choice of bases in 9H and W*. 

DBBTNmoN 4.2,3. The matrix A in (4,2.4) is said to represent 
Idle linear operator L with respect to the bases 58, SB* of 9K, SDi*. In 

A = SB, SB*). 

Suppose now that we take new bases in SDl and 9Jl*. With respect 
to these bases the linear operator L will be represented by a new 
matrix; and we see, in fact, that L possesses an imBnity of matrix 
representations corresponding to the variable choice of bases in 
9K and 9Jl*. We are therefore led to consider what relations exist 
between different matrices representing the same operator. 

Theorem 4.2.2. Let L be a linear mapping of 501 into 501*, and 
let SB, S be any bases in 501 and ®*, S* any bases in 501*. If 
A = m{L-, SB, SB*), A = ^(L; S, S8*), 
then there exist non-singular matrices P, 0, of order m, n respectively, 
such that ^ _ QAp-i. (4.2,6) 

The proof is almost immediate. If X is any element in 501, write 
X = ^{X\ S), S = Bi{X-, ®), 

X* = M{L{Xy, SB*), X* = ^(i(X); S*). 

Then, by hypothesis, 

X* = Ax, X* = Ax. 

Moreover, by Theorem 4.1.2 (i), there exist non-singular matrices 
P, 0, of order m, n respectively, such that 

X = Px, X* = Ox*. 

Hence X* = Ox* == OAx = OAP“^X, 

and (4.2.6) follows by virtue of the uniqueness of A. 

4.2.2. The situation described by Theorem 4.2.2 becomes even 
more interesting if we take 501 and 501* as identical linear manifolds. 
In that case both the variable X and the functional value L(X) are 
elements of 501, and we speak now of a linear mapping, or linear 
transformation, of 501 into itself."f The mapping L{X) — X (for all 
X e 501) is called the identical transformation of 501. 

t It is important to stress that if JS is a linear mapping of into itself, then to 
each element X e corresponds a unique element L{X) e 951. On the other hand, 
given X' e 951, there may be no X e 951 such that L(X) = X' ; or there may possibly 
be more than one X satisfying this relation. 



IV, § 4.2 LINEAR OPERATORS AND THEIR REPRESENTATIONS 117 

Exebciss 4.2.3. Show that the operator which transforms every element 
of a linear manifold into the zero element is a linear mapping of into 
itself. 

By specializing Theorems 4.2.1 and 4.2.2 we could obtain a 
number of results about transformations of 9M into itself. But it is 
essential that the reader should become thoroughly familiar with 
the idea of representation of operators by matrices and we prefer, 
therefore, to give an independent derivation of the theorems relating 
to linear transformations of a linear manifold into itself. Through¬ 
out the remainder of the present secton we shall write d(50i) = n. 

Theorem 4.2.3. Let L be a linear transformation of 501 into itself y 
and let S be any basis of 9K. If, for any X e^RyWe write 

X = Sl{X\ ®), x' = 3i(L(Xy, ®), 


then there exists a unique nxn matrix A, indejpendent of X, such that 


Let <8 

x' = Ax. 

= and write 

(4.2.7) 


L{Ef) {j = 1,...,72.), 

(4.2.8) 


X = x' = {Xi,...,Xn)^. 


Then 




= ix,ia,,E, = iE,ia,,x„ 

j — 1 % — l t = l J—1 


so that 

n 

x'i = J,aijXj ii=l,...,n). 



This is, in fact, (4.2.7) with A = (a^y); and the uniqueness of A 
follows in the usual manner. 


Theorem 4.2.3 shows, in particular, that a linear transformation 
of 501 into itself can be represented by a linear transformation of 58^ 
into, itself. 

We may also remark that the matrix equation x' = Ax, where A 
is non-singular, can be interpreted in two different ways. On the 
one hand, we may think of 501 as undergoing a transformation which 
changes the element X, represented by x, into the element L{X), 
represented by Ax. On the other hand, we can consider the effect 
produced by a change of basis in 501. In view of Theorem 4.1.2 (ii) 
(p. 112) we know that if x represents X e 501 with respect to S, 



118 LINEAR OPERATORS IV, § 4.2 

then a second basis ® can be found such that Ax represents the 
same element X with respect to ©. The transition from x to Ax 
can thus be regarded as resulting either from a linear transforma¬ 
tion of or from a change of basis in SOI. The same twofold aspect 
of maj/nx transformations appears in projective geometry when we 
coMlaer collineations and changes of the coordinate system. 

The terminology and notation of Definition 4.2.3 can, of course, 
be suitably modified to meet our present requirements. 

Definition 4.2.4. The matrix A in (4.2.7) is said to represent 
the linear transformation L with respect to the basis S of SOI. In 

A = ^{L; »). 

It is easy to obtain a converse of Theorem 4.2.3. This shows, in 
particular, that, for a fixed choice of basis, there is a biunique 
correspondence between the linear transformations of SOI into itself 
and the matrices representing them. 

Theorem 4.2.4. Let A be an nxn matrix and S a basis of SOI. 
Then there exists a unique linear transformation L o/ SOI into itself 
which is represented by A with respect to S. 

Let A = {a^fj and S = and let the elements L(Ef) 

of SOI be defined by (4.2.8). Then, in view of the linearity of L, 
L{X) is uniquely determined for every X e 9JL Using now the 
equations appearing in the proof of Theorem 4.2.3 we can easily 
show that A = S). 

Suppose now that we also have A = S), where M is 

some linear transformation of 9K into itself. Then, if Z g 9Ji and 
X == m{X\ S), 

^{L{X); S) = Ax = m{Mi,X)\ 93). 

Hence L(X) = M(X) for all Z g SOI, i.e. L = Jf as required. 

Exercise 4.2.4. AVrite out the dotails of tho proof skotchod. above. 

Exercise 4.2.6. Let Li, be linear transformations of S[R into itself, 
and let the transformations aLj, Li+Lg, be defined respectively by 
the formulae 

(aLi)(Z) = aLi(Z); = L,{Xy\-L^(X)i 

{L^L,){X) = L^{Li(X)} (Z G 9W). 

Show that olLi, Li-\-L 29 are all linear. 

If Lj, Lj are represented, with respect to a basis 5B, by the matrices A^, A 
respectively, show that are represented by (xA^, A^-f-Aj 

AjAj respectively. ’ 



IV, § 4.2 LINEAR OPERATORS AND THEIR REPRESENTATIONS 119 

In comparing matrix representations of a linear operator with 
respect to different bases we encounter a type of relation between 
matrices which will play a prominent part in our subsequent 
discussion. 

Definition 4.2.5. If A and B are square rmtriceSy and 

B = S-iAS, (4.2.9) 

where S is some non-singular matrix^ then B is said to be obtained 
from K by a similarity transformation, in which S is the 
TRANSFORMING MATRIX. Furthermore, B is said to be similar to A. 

Since (4.2.9) may be rewritten in the form A = (S-“^)“"^BS'"^, it 
follows that if B is similar to A, then A is similar to B. There is, 
therefore, no ambiguity in speaking simply of two matrices as being 
similar. 

The next theorem states, roughly speaking, that two matrices are 
similar if and only if they represent the same linear transformation 
of 9K into itself. 

Theorem 4.2.5. (Representation theorem for linear trans¬ 
formations) 

(i) Let Lbea linear transformation of 501 into itself. If A, A are 
the nxn matrices representing L with respect to the bases S, © o/ 501 
respectively, then A, A are similar. 

(ii) Let A, A be similar nXn matrices, and let 5B be any basis of 501. 
Then a second basis S and a linear transformation L of 3R into itself 
can be found such that 

A = m(L', 5B), A = ^(L; ©). 

To prove (i), let X e 501 and write 

X = ^{X; 58), X = 3i(X; ©), 

X' == 3l(L(Xy, 58), X' - ^{L{X); ©). 

By hypothesis we have 

x' = Ax, x' = Ax. 

Furthermore, by Theorem 4.1.2 (i) (p. 112), there exists a non¬ 
singular matrix P such that 

• X = Px, X' = Px'. 

From these relations we infer that 

x' = Px' = PAx = PAP-ix, 



120 LINEAB OPEBATORS IV, j 4.2 

and it foUows that A = PAP-i; (4.2.10) 

thus A, A are similar. 

Assume, next, that A, A are given similar matrices, so that 
(4.2.10) is satisfied for some matrix P. If fB is a given basis, then, 
by Theorem 4.2.4, there exists a linear transformation L of 9K into 
itself such that A = SB). This means, of course, that if JT eSR 

X = ^(Z; <B), X' = mL{Xy, SB), 
then x' = Ax. 

Again, by Theorem 4.1.2 (ii) (p. 112), possesses a second basis S 
such that, if 

X - ^(X; S), X' = ©), 

then X = Px, x' = Px'. 

Hence, in view of (4.2.10), x' = Ax, i.e. A = ^(L; ©). The proof 
is therefore complete. ' 

The preceding theorems show that in dealing with linear trans¬ 
formations of a linear manifold into itself we encounter a situation 
of a type with which we are already familiar from the study of 
representations of a linear manifold by means of vector spaces.! 
A linear transformation of SDt into itself exists quite independently 
of any representation; but it possesses, at the same time, infinitely 
many different representations (in terms of matrices) none of which 
has special precedence over the others. Our choice of representa¬ 
tion (that is, essentially our choice of basis in 501) is therefore 
governed in any particular problem solely by considerations of 
convenience. 

4.2.3. Since in the problems that actually arise we deal more 
often with vector spaces than with linear manifolds, it is useful to 
reformulate some of the results obtained in § 4.2.2. 

We note, in the first place, that if L is a linear transformation of 
into itself and X = {Xi,...,xJ is a basis of 95^, then there exists 
a (unique) nxn matrix A such that 

A == ^(L; X). (4.2.11) 

This means, in fact, that if 

X = aiXi+...+«„x„, L{x) = i8iXi+...+^„x„, 

= A(ai,...,a„)2’. (4.2.12) 

t Soe § 2.4. 


then 



IV, § 4.2 LINEAB OPERATORS AND THEIR REPRESENTATIONS 121 

A more convenient way of characterizing the representing matrix 
A follows from (4.2.8) (p. 117). This set of equations shows that if 

i(x^) = 5 (j = (4.2.13) 

i=»l 

then A = 

In the particular case when X is taken as ffi = {ei,..., e^} (4.2.11) 
means, then, that 

n 

L(^}) {j = !,...,«). 

Writing X = (*1 .and using Theorem 3.3.6 (p. 84) we see 

that 

L{x) = L(a;iei+...+a:„e„) = ^XjL{ej) ==^XJA^,J = Ax. 

We have, therefore, the following result: 

Theorem 4.2.6. If L is a linear mapping of 93^ into itself, and 
A = ^{L; (E), then, for all xe 95^, L{x) = Ax. 

Exercise 4.2.6. Deduce Theorem 4.2.6 from (4.2.12). 

Theorem 4.2.7. Let Lbea linear mapping of 93^ into itself. Let 
X = {Xi,...,x^} and ^ = {x:i,...,x^} be any two bases in 93^, and 
suppose that „ 

% =. (j = !,...,«). 

If A = X), A = ^(L; S), 

then A = P“^AP, 

where P = (jPij).t 

This result follows immediately from Theorem 4.2.5 (i) and the 
construction of P in the proof of Theorem 4.1.2 (i), when an obvious 
change of notation has been made. 

We may specialize Theorem 4.2.7 still further by taking C to 
be one of the given bases. 

Theorem 4.2.8. Let X = {Xj,..., x„} be a basis of and let X 
J)e thenxn matrix having Xi,...,x^ {in that order) as its columns. If 
the nxn matrix A represents the linear transformation L of 93,^ into 
itself with respect to ®, then X“^AX represents L with respect to X. 


t The matrix P is non-singular by virtue of Theorem 4.1.1. 



132 LINEAR OPERATORS IV. § 

We apply Theorem 4.2.7 with (£, X in place of X, ^ respectively. 
This shows that if 


Xj — (j — I.”*.®), 




(4.2.14) 


then ^(L; X) — P“^AP, where P = (p<^). But (4.2.14) can be 
written as 

Xj = {j — 1,.”»^)* 

Hence /Pu . . . p^A 

x= . Up, 

\ Pnl * • • Pnn j 

and the assertion follows. 


4.2.4. Now that the relations between matrices and linear 
operators have been discussed at some length, it becomes clear that 
the properties a square matrix may possess are of two distinct kinds, 
which might be termed ‘invariant’ and ‘non-invariant’—an 
‘invariant’ property being one that is shared by all matrices 
representing (with respect to suitable bases) the same linear 
operator as the given matrix. In view of Theorem 4.2.5 this simply 
means that a property is invariant if it is possessed by an entire 
class of similar matrices. It is, for instance, obvious that similar 
matrices are either all singular or all non-singular, and singularity 
is thus seen to be an invariant property. On the other hand, the 
symmetry of a matrix A, i.e. the property = A, is clearly non¬ 
invariant. 

Invariant properties of matrices are fundamental since they 
express intrinsic characteristics of the underlying linear operators, 
whereas non-invariant properties depend on the arbitrary choice 
of a basis in a linear manifold and so are, as it were, accidental. To 
determine the invariant features of a given matrix A (that is, to 
determine the intrinsic characteristics of the operator represented 
by A) we endeavour to find and then to examine some matrix A, 
similar to A, and having at the same time as simple a form as 
possible. This procedure leads to the discussion of canonical forms 
which will be undertaken in Chapter X. 

In projective geometry, too, similarity transformations play a 
prominent part. Projective geometry is the study of ‘projective* 
space’. Although this space is not a linear manifold, its properties 
can be made to depend in a simple way on the properties of such 
manifolds. For example, projective space can be represented by 



IV, § 4.2 LINEAR OPERATORS AND THEIR REPRESENTATIONS 123 

means of systems of ‘projective coordinates’; each such system is 
derived from a basis of a vector space, and the change from one 
system to another can be described in terms of matrices. In 
discussing projective geometry we have often to deal with collinea- 
tions, i.e. with linear transformations of projective space into itself. 
These transformations have properties closely resembling those of 
linear transformations of a linear manifold; in particular, they can 
be represented by matrices, and two matrices represent the same 
collineation with respect to two systems of coordinates if and only 
if each is similar to a scalar multiple of the other. It follows that in 
investigating the nature of a given collineation it is advantageous 
to choose the system of coordinates in such a way that the represent¬ 
ing matrix assumes as simple a form as possible; and this amounts 
to finding a matrix having a simple form and similar to the given 
matrix. Thus, no matter whether our interest lies primarily in 
pure matrix theory or in geometry, we are led naturally to the 
recognition of the importance of similarity transformations and to 
the systematic study of these transformations.! 

4.3. Isomorphisms and automorphisms of linear manifolds 

The reader has probably already noticed the close resemblance 
between the notion of an isomorphism (as specified in Definition 
2.4.1, p. 58) and that of a linear transformation. An isomorphism 
between two linear manifolds SR and SR* is obviously a linear 
transformation of SR into SR* (and equally of SR* into SR). On the 
other hand, a linear transformation of SR into SR* need not be an 
isomorphism, as may be seen, for example, by considering the 
linear transformation which maps every element of SR into the 
zero element of SR*. If, however, in addition to the requirements 
imposed on SR, SR*, and L in Definition 4.2.2 (p. 114) we also 
assume that to every element X' of SR* there corresponds one and 
only one element X of SR such that L{X) — X', then the linear 
transformation L of SR into SR* becomes an isomorphism, and 
SR, SR* are isomorphic. 

A somewhat weaker requirement is specified in the next defini¬ 
tion. 

t For a discussion of projective geometry based on the methods of linear 
algebra the reader may be referred to Semple and Kneebone, Algebraic Projective 
Geometry, or, for a more comprehensive treatment, to Todd, Projective and 
Analytic Geometry, 



124 LINEAR OPERATORS IV, $ 4.3 

Definition 4,3.1. A linear transformation {or linear mapping) 
of 211 ONTO 2R* is a finear transformation L, of 2)1 into 2)1*, having 
the additional prc^erty that, corresponding to every X' e 2)1*, there 
eadsts at jJEAS’^/idne X e 2)1 which satisfies the eqvation L{X) = X'.f 

In partic^r, we may speak of a linear transformation of 2)1 onto 
itself. / 

We lafote at once that an isomorphism between 2)1 and 2)1* is a 
mapj^g of 2)1 onto 2)1*. The converse need not be the case. Thus, 
t£^ 2)1 = SB 3 , 2)1* = 932 define L as the transformation which 
associates with every vector {x^, ccj, ^*^ 3 ) of 933 the vector {x^, x^) of 
933 . It is easily verified that L is a linear mapping of 933 onto 932, 
but it is obviously not an isomorphism since it does not set up a 
biunique correspondence between the vectors of 933 those of 
932- If, however, the dimensionalities of 2)1 and 2)1* are equal, then 
a linear mapping of 211 onto 2)1* and an isomorphism between 2)1 
and 2)1* are equivalent concepts. This is shown by the next theorem. 

Theorem 4.3.1, If 2)1, 2)1* are linear manifolds having the same 
{finite) dimensionality and if L is a linear transformation of 2)1 onto 
2)1*, then L is an isomorphism. 

Let (i(2)l) = d(2)l*) = n and let {j^ 2 ,...,jEJ„} be any basis of 2)1. 
If X' is any element of 2)1* then there exists, by hypothesis, at least 
one element X in 2)1 such that X' = L{X). Writing 

we therefore obtain 

X' = aiL(^)i)+...4-a»L(^„). 

Thus L{Ej),...,L{En) are generators of 2)1* and, since d(2)l*) = n, 
it follows that these elements constitute a basis and so are linearly 
independent. 

Now let Y' e 2)1* and suppose that ^1,1^2 ^ ®1 and L{Yi) — Y', 
L{Y^ = Y'. Then LtY^—Y^ = ©*, where 0* denotes the zero 
element of 2)1*. Writing 

Tl —12 = 

we obtain ySj L(iSfi)+...+j9„ = 0*, 

and since L{Ei),...,L{E„) are linearly independent this implies 
that j3i = ... = j8„ = 0. Hence Y^ = Y^, and thus corresponding, 
to any element Y' e 2)1* there exists precisely one element F 6 211 
such that L{Y) = Y\ The mapping L is therefore an isomorphism, 

t Thus * mapping onto’ is a special case of ‘mapping into*. 



IV, §4.3 ISOMORPHISMS AND AUTOMORPHISMS 126 

We are particularly interested in the case when 501 = 501*. 

Definition 4.3.2. An automorphism of a linear manifold 501 
is an iscymorphism of 501 vnth itself. 

The mapping L(X) = X (for all X g 501) is called the identical 
autom(yrphism of 501. 

Exercise 4.3.1. Show that the biuniqiie correspondence between the 
elements of 932» specified by the scheme 

is an automorphism of SOj- 

As a special case of Theorem 4.3.1 we have the following result. 

Theorem 4.3.2. A linear transformation of a finite-dimensional 
linear manifold onto itself is an automorphism. 

A criterion for deciding whether a linear transformation L of 501 
into itself is an automorphism can be given in terms of matrix 
representations of L. We need to observe in the first place that the 
(infinitely many) matrices representing *L are either all singular or 
all non-singular. For any two such matrices A, A are similar by 
Theorem 4.2.5, i.e. they are connected by a relation of the form 
A = PAP“^; and hence, by Theorems 3.5.1 (p. 87) and 3.6.3 (i) 
(p. 93), |A| = |A|. 

Theorem 4.3.3. A linear transformation L of a finite-dimensional 
linear manifold 501 into itself is an automorphism if and only if the 
matrices representing L are non-singular. 

Let 58 = {E-^,...,E^ by any basis of 501, and write 
1=1 

Then A = {a^j) = S/t(L\ SB). Suppose first that |A| ^0. Let X' 
be any given element of 9Jl and write x' = Sl{X'\ SB). Let Xe 
be defined by the relation A“^x' = ^(X; SB). Then 

^(X'; SB) = A.^(X; SB), 

and it follows from Theorem 4.2.3 (p. 117) that L(X) = X'. Thus 
L is a linear mapping of 9K onto itself; and so, by Theorem 4.3.2, it 
is an automorphism. 

• K, on the other hand, |A| = 0, then L{E^ are linearly 

dependent by virtue of Theorem 4.1.1, i.e. there exist scalars 
o[i,...,a„, not all 0, such that 



126 LINEAR OPERATORS IV, § 4.3 

where 0 is the zero element of SOI. Thus, for some X 7 ^: 0, L{X) = 0. 
But clearly L{Q) = 0 , and therefore L is not an automorphism of 
SOI. The proof is now complete. 

We may summarize the results of Theorems 4.3.2 and 4.3.3 by 
saying that the following statements, relating to a linear mapping 
L of a finite-dimensional linear manifold of SOI into itself, are equi¬ 
valent. 

(i) L is a mapping of SOI onto itself; 

(ii) L is an automorphism of SOI; 

(iii) the matrices representing L are non-singular. 

Exercise 4.3.2. Let SO?, SO?* bo linear manifolds of the same finite dimen¬ 
sionality and let L be a linear mapping of SO? into SO?*. Show that the matrices 
representing L are either all singular or all non-singular. 

Exercise 4.3.3. Show that if SO?, SO?* are linear manifolds of the same finite 
dimensionality and L is a linear mapping of SO? into SO?*, then the following 
statements are equivalent. 

(i) L is a mapping of SO? onto SO?*; 

(ii) L is an isomorphism between SOI and SO?*; 

(iii) the matrices representing L are non-singular. 

4.4. Further instances of linear operators 

The idea of a linear operator introduced in Definition 4.2.2 pervades, in 
one form or another, a large part of mathematics. The operators we have 
encountered so far have all been associated with matrices, but there also 
exist linear operators of quite different types, and many of these are impor¬ 
tant in analysis and especially in the theories of differential and integral 
equations. The basic fact in this context is that differentiation and integra¬ 
tion are linear operators, i.e. 

^ {o‘f{x)+Pg(x)} = <xf'{x)+Pg'(x); 

b b b 

J {oi/(x:)+j8g(x)} dx = (xj/(x) dx +)8 J g{x) dx. 
a a a 

The linearity of all operators mentioned below is, as wo shall see, an almost 
immediate consequence of this faxjt. 

Consider, for example, the class (£ of all real-valued functions of t and its 
subclass consisting of those functions which possess derivatives of the 
nth order. Multiplication by scalars and addition of elements in d can be 
defined in the obvious way, and with respect to these operations (f and 
are, of course, linear manifolds (of infinite dimensionality). Let, now, Q be 
the operator 

rfn-i ^ 



IV, §4.4 FURTHER INSTANCES OF LINEAR OPERATORS 127 


where a®, Oi,...,««-!»constants. If D operates upon a function x = x(t) 
in the resulting function is denoted by or Qx, and is defined by the 
equation 


d^x 




dx 




'dr 


dt 


It is clear that, for every real number a and every pair of funetions x,y G 
we have Q((xx) = o£lx, 11(0?+ 2 /) = Qx-\-Q,y, 


Hence H is a linear transformation of into (£, and it is precisely this fact 
which underlies the theory of the differential equation 


Qx — 0, 


since it implies that every linear combination of solutions of the equation 
is again a solution. The set of all solutions is, in fact, a linear manifold and, 
as wo may recall, its dimensionality is n (provided that 0); thus every 
solution may bo expressed as a linear combination of n linearly independent 
solutions. 

A somewhat similar situation arises with regard to numerous partial 
differential equations. We may, by way of illustration, consider the three- 
dimensional equation of heat conduction, namely. 


d^u d^u d^u hu 
dx^ dy^ dz^ ^ dt * 


If fl denotes the operator 


52 c )2 d 


(4.4.1) 


then this equation can bo rewritten in the operational form Qu — 0. Now the 
functions of the four variables x, y, z, t which possess all the requLsito partial 
derivatives form a linear manifold and it is at once clear that is a 
linear operator in that manifold. Hence any linear combination of solutions 
of ^lu = 0 is again a solution, and the method of treatment of the equation 
(4.4.1) depends essentially upon this fact. Exactly the same remarks apply 
to most of the other standard equations of mathematical physics, such as 
the wave equation, and Laplace’s equation. Generally speaking, when the 
operator specifying an (ordinary or partial) differential equation is linear, 
the equation is far more likely to bo tractable than when it is not. 

We may also mention certain operators connected with integration. Let 
3 be the class of real-valued functions integrablo in the range a < ^ < 6. 
This class is obviously a linear manifold. The operator Q defined by the 
relation 6 

Qx = J x(t) dt 

a 


is clearly a linear transformation of 3 into the manifold of real numbers. A 
gimilar, but slightly more complicated linear operator O, defined for the 
linear manifold of functions integrable in — 00 < ^ < cx), is given by the 
relation i, 

Qx = j K(t)x{u—t) dt. 


(4.4.2) 



ISS LINEAR OPERATORS IV. § 4.4 

where K(t) is a given integrable function, known as the kernel of O. Here 
Ox is a hmction of w, and so (when K(t) satisfies suitable conditions) O is a 
linear transformation of the class of integrable functions into itself. Operators 
such as (4.4.2) occur in the theory of integral equations. 

Many other ‘integral transforms’ involve linear operators. The Laplace 
tranaform of a fimction x(t)f for example, is defined as the function x(p) 
given by the formula oo 

x(p) = J e-^^x(t) dt, 

0 

and the operation of changing x{t) into x(p) is obviously linear. The treat¬ 
ment of many types of differential equations depends precisely upon this fact. f 


PROBLEMS ON CHAPTER IV 

1 . jL is a linear transformation of a linear manifold $01 into a linear manifold 
SOI*. Show that the set of all X 6 $01 such that L(X) == 0 *, where©* denotes 
the zero element of $ 01 *, is a submanifold of $ 01 . 

2. Show that every linear mapping of one linear manifold onto another 
preserves linear dependence but not necessarily linear independence. Show 
further that if the dimensionalities of the two linear manifolds are equal, then 
linear independence is also preserved. 

3. 1 / is a linear transformation of a linear manifold $01 into itself; 
5B = {^ 1 ,..., is a basis of $ 01 ; v(S) is the maximum number of linearly 
independent elements among L(Ei),,„, L{En); and $0(58) is the submanifold 
of $01 spanned by L(Ei),.,.,L(En)* Show that both v( 93 ) and $0(93) are 
independent of the choice of 5B. 

4. Let L, U be linear transformations of a linear manifold $01 into itself. 
Show that, if a, a' are any complex numbers and 58 any basis of $01, then 

^(ocL+a'L'; 58) = a^(L; 58)+a'^(L'; 58). 

5. jL is a linear transformation of a linear manifold $01 into a linear manifold 
$01*. Show that $01* possesses a submanifold $01' such that L effects a linear 
transformation of $01 onto $ 01 '. 

6 . Lot L bo a linear transformation of 933 itself which is represented, 

with respect to the basis (— 1 , 1 , 1 ), ( 1 , 0 , — 1 ), ( 0 , 1 , 1 ), by the matrix 



Find the matrix representing L with respect to (£. 

7. A linear mapping L of $83 into itself transforms the vectors (— 1,0,2), 
(fi» 1» 1)* (3, “- 1 , 0 ) into ( — 6,0,3), ( 0 , — 1 , 6 ), ( — 6 , —1,9) respectively. Find 
the matrix representing L with respect to (i) the basis consisting of the first 
three vectors given above; (ii) (£. 

t The reader who wishes to see how the algebraic idea of linearity can be 
systematically employed in analysis should consult Courant and Hilbert, Methods 
of Mathematiced Physics, 



IV 


PROBLEMS ON CHAPTER IV 


129 


8. The matrix 



represents the linear transformation L of the 


complex space 932 with respect to the basis (£. Find the matrix representing 
L with respect to the basis consisting of the vectors (1, — i). 

9. L is a linear transformation of 93^ into itself, © is a basis of and ©' 

is another basis obtained from © by rearrangement of the vectors in ©. 
Determine the relation between ©) and ©'). 

10. L is a linear transformation of the n-dimensional linear manifold S03 
into itself; 9Jli is the submanifold consisting of elements of the form L{X) 
(X e 931); and 9312 is the submanifold of elemonts X g 931 such that L(X) — 0, 
whore 0 is the zero element of 931. Prove that 


^^(931i)+d(93l2) -- n. 

11. Verify the result of the preceding question for the case when 931 = ©j 
and L is the matrix operator 


0 1 0 \ 

0 0 l). 

0 0 0 / 

12. Let be the set of all polynomials in t of degree < n. Show that, 
with the obvious definitions of addition and multiplication by scalars, 

is a linear manifold and that © = {1,^, ^2,...,^”“^} is a basis of Show 
further that the operator D, defined by the relation Dp{t) = p'(0» is linear 
and find ^{D; ©). 

13. Let U, ir be complements in ©„, and lot L,M bo linear transforma¬ 
tions of into itself such that I/(x) = M(x) whenever x G U or x e U'. 
Show that L = M, 

14. Let 931, 931* bo isomorphic linear manifolds, and let L be a linear 
mapping of 931 into itself. Show that there exists a linear mapping L* of 931* 
into itself such that L(X) iy*(-Sr*) whenever X JV*. 

15. L is a linear mapping of an n-dimensional linear manifold 931 into itself, 
and Xi,...,X^ are linearly independent elements of 931. Prove that L is an 
automorphism of 931 if and only if iy(Ari),..., L(X^) are linearly independent. 

16. 931 is an n-dimcnsional linear manifold over 5y» X,X* are two 
distinct elements in 931. Show that there exists a linear transformation / of 
931 into 55 such that /(X) 

17. 931 is a linear manifold of (finite) dimensionality n, over a field 55» and 
931* is the set of all linear transformations of 931 into 55* Show that, with 
suitable definitions of multiplication by scalars and addition, 931* is also a 
linear manifold of dimensionality n. 

18. 931 is a linear manifold of dimensionality n. Show that the set of all 
linear transformations of 931 into itself is a linear manifold of dimensionality 

Generalize this result for the case of linear transformations of 931 into a 
second linear manifold 931'. 

19. Let L be a linear transformation of ©„ into itself and let 

© = © = {x^,...,x,^} 

E 


6682 



130 LINEAR OPERATORS IV 

be bases of 5Bn such that, for X: = x* is a linear combination of 

X 2 ,...,Xj^. Show that there exists a triangular matrix A such that 

A^(L; «') = ^(L; 5B)A. 

20. Let SB = {X 2 ,...,x^} be a basis of and suppose that A is an nxn 
matrix such that Ax^^i = ... = Ax„ = 0, while each one of AXi,..., Ax^ is 
a linear combination of x^,..., x,.. Prove that, if A represents the linear trans¬ 
formation L with respect to Qc, then the matrix representing L with respect 
to SB has the form 

where A^ is an r x r matrix. 

21. Let 1 < r < n, and suppose that U is an r-dimensional subspace of 
SBn* Show that the general formula for the matrix A, which satisfies the 
conditions 

Ax = 0 (for all x e U), Ax ^ 0 (for all x e U), 



where S is any given non-singular matrix whose first r columns constitute a 
basis of U, while P is an arbitrary rx{n—r) matrix and Q an arbitrary 
non-singular (n—T)x(n—r) matrix. 

22. A transformation L of 23^ into itself is called a projection if there exist 
complements U, H' such that L(x) = x for all x e U and L(x) = 0 for all 
X 6 U'. Show that (i) a projection is a linear transformation; (ii) a linear 
transformation L is a projection if and only if — L; (iii) a linear trans¬ 
formation L is a projection if and only if J—L is a projection, where 1 
denotes the identical transformation of ©n- 

23. Let Li, Lj be two projections. Show that (i) Li-f Lj is a projection if 

and only if ~ 0\ (ii) is a projection if and only if 

Jj\ L 2 = Lj Li — L^* 




V 


SYSTEMS OF LINEAR EQUATIONS AND 
RANK OF MATRICES 


In the present chapter we shall give a complete account of the 
theory of simultaneous linear equations. The results of Chapter IV 
will not be needed here and we shall make use only of the simplest 
properties of determinants, vectors, and matrices. The most 
important new idea which we shall introduce is that of rank of a 
matrix. 


5.1. Preliminary results 

In this section we shall explain the terminology and notation and 
consider the simplest cases of our problem, 

5.1.1. Definition 5.1.1. A linear equation in the unknowns 
is an equation of the form 

aj = 6. (5.1.1) 

Ifb — d, then (6.1.1 ) ia hnorm as a homogeneous lineae equation. 


A system of linear equations in the unknowns Xi,has the 

«u *» = 1 

. , ( 6 . 1 . 2 ) 

where the and 6^ are given numbers. The associated system of 
homogeneous equations is given by 


Uii a:i+...+Oi„a:„ = 0] 

. . (6.1.3) 

Our problem is to investigate whether a set or sets of values of 
satisfjdng (5.1.2), or (5.1.3), can be found; how all such sets 
may be determined in the case when they exist; and what are the 
relations between them. 

• A certain amount of complication is introduced into the study of 
linear equations by the possible presence of redundant equations. 
Consider, for instance, the system of equations 

x-\-y = 6 , 2*+j/ = 3, 3a:+2y = 8. (6.1.4) 




132 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.1 

It is clear that the last equation adds nothing to the information 
provided by the first two, since it can be obtained by'adding these 
equations. Such an equation which arises from a linear combina¬ 
tion of other equations of the system is called redundant (with 
respect to the system in question.) It is easy to see that redundant 
equations can be discarded. 

Exebcise 6.1.1. Discuss the fallacy involved in the following argument. 
‘The second and third equations in (6.1.4) are both redundant since the 
second is equal to the difference of the third and first while the third is equal 
to the sum of the first and second. Hence the second and third equations 
may both be discarded, and the solutions of the system (5.1.4) are precisely 
the solutions of the single equation x-\-y = 6.* 

Definition 6.1.2. A solution of a system of equations in the 
unknowns a?!,..., is a set of numbers every equation 

of the system is satisfied for the values == 

Even quite trivial instances reveal that a variety of different 
cases may arise with regard to the nature of solutions of a system 
of linear equations. Thus the system 

^1+^2 == ^ 
possesses no solution) the system 

^ 1+^2 == 2 , Xj^—x^ = 0 

possesses premeZy one solution (x^ = x^^ 1); and the system 
^1+^2 = 2a;i+2a;2 = 2 

possesses an infinity of solutions {x^ = t, X 2 = l—Z, t arbitrary). 

Definition 5.1.3. A system of equations is consistent if it 
possesses at least one solution ; otherwise it is inconsistent. 

A homogeneous systemf is necessarily consistent since (5.1.3) 
always possesses the zero solution rri = ... = = 0. Such a 

solution is, however, of little interest. 

Definition 6.1.4. The solution x^ = .„ = x^ = 0 of a homo¬ 
geneous system in the unknowns ajj,..., is called trivial; any 
other solution is called non-trivial. 

A system of equations not all of which are homogeneous may, of 
course, be consistent or inconsistent; the examples given a little 
earlier illustrate both alternatives. 


t i*e. a system of homogeneous linear equations. 



PRELIMINARY RESULTS 


133 


V, § 6.1 

In our discussion below we shall make systematic use of matrix 
technique. Thus the system of linear equations (5.1.2) will generally 
be written in the form 

Ax = b, 

( Uii . . . \ 

.I, X = 

• * • ®mn/ 

Correspondingly, the homogeneous system (5.1.3) assumes the form 

Ax = 0. 

A solution (of either system) will, accordingly, 

be regarded as a column vector (l^,...,^,^)^. 

5.1.2. It is convenient to determine at once the connexion 
between a system of linear equations Ax — b and the associated 
homogeneous system Ax = 0. 

Theorem 5.1.1. Suppose that the system Ax = b possesses a 
solution Xq. Then (i) any solution of this system is expressible as the 
sum o/Xq and a suitable solution of Ax = 0; (ii) the sum o/Xq and any 
solution of Ax = 0 a solpition of Ax = b. 

We may express this result informally by saying that the general \ 
solution of Ax = b is equal to the sum of any particular solution < 
of Ax = b and the general solution of Ax = 0. 

The proof is trivial For let x^ be any solution of Ax = b. Then 
Xi = Xo+(Xi-Xo), and 

A(Xi—X q) == Axi—Axo == b—b = 0. 

Thus Xi = XQ+y, where y is a solution of Ax = 0, and so (i) is 
proved. Again, let Xg be any solution of Ax == 0. Then 

A(Xo+X 2 ) = Axo+Axg = b+0 = b, 
and so Xo+Xg is a solution of Ax = b. This proves (ii). 

The demonstration just given succeeds by virtue of the fact that matrix 
multiplication is a linear operation.f In this context it is useful to recall that 
a result analogous to Theorem 6.1.1 holds in the theory of differential 
equations.t Let Q(t) be a given function of t,f a, polynomial, and D the 
operator d/dt. Consider tho differential equation 

f(D)x = Q(t) (6.1.6) 

and its auxiliary equation f(D)x = 0. (6.1.6) 

t See Definition 4.2.2 (p. 114). { Compare the remarks in § 4.4. 




134 LINEAR EQUATIONS AND RANK OF MATRICES V, § 5.1 

Then, as we know, the general solution of (5.1.6) is equal to the sum of any 
particular solution of (6.1.6) and the general solution of (6.1.6). The proof of 
this result depends on the linearity of the differential operator / (D) in 
precisely the same way as the proof of Theorem 6.1.1 depends on the linearity 
of the matrix operator A. 


5,1.3. The simplest case of a system of linear equations is that 
in which the number of unknowns is equal to the number of equa¬ 
tions. Since the general process of solution depends on reducing 
any given system to an equivalent system of this special type, we 
begin by discussing systems of the form (5.1.2) for which m = n. 


Theorem 6 .1.2. (Cramer’s rule) 

If the nxn matrix A = {a^fj is non-singular^ then the system of 
linear equations 

= 6i] 

. [ (6.1.7) 


possesses a unique solution given by 




|A| 


{i = l,...,w), 


(5.1.8) 


where is the matrix obtained when the i-th column of A is re'placed 
by the vector b = (6i,...,6^)^. 

The solution (5.1.8) was found by Cramer in 1760. Essentially 
the same result was known, however, to Leibnitz some fifty years 
earlier. 

If we rewrite (6.1.7) in the matrix form Ax — b, where 
X = (jri,...,ir^)^, we see at once that the existence and uniqueness 
of the solution are guaranteed by Theorem 3.6.2 (i) (p. 92), which 
also shows that the solution is given by the formula x = A"^b. 
Denoting the cofactor of the {i,j)th element in A by we therefore 
obtain 



Hence, by Theorem 1.4.1 (p. 16), we have for i = l,..., 7 i. 


= 


lAl 


{Ai^bi+.,.-\-A^ibn) = 


\m 

iAi ■ 



PBELIMINARY RESULTS 


135 


V,S5.1 

As an illustration consider the system of equations 
ax-\-hy = A, a'x-\-Vy = k\ 
Cramer’s rule shows that, if 


a b 
a' 6' 


0 , 


then the system has a unique solution given by 


k 

b 

1 

a b 


a k 

1 

a b 

k' 

b' 

/ 

a' b' 

, y = 

a' k' 

/ 

a' b' 


Again, if 
then the system 


jD = 


a b c 
a' b' c' 
a" 6" c" 




ax-{-by-{-cz — k 
a'x-\-b'y-\-&z ==i k' 
a^x+b^y^c'^z = ¥ 


has a unique solution given by 



k b c 


a 

k c 

X = i)-i 

k' b' c' 
k" b" c" 

, y = D-i 

a' 

a" 

k' c' 
k" c" 


d-A 


a b k 
a' 6' ¥ 

¥ 6" fc" 


Corollary. If A is a non-singular square matrix, then the only 
solution of the homogeneous system Ax = 0 is the trivial solution 

x = 0. 


An equivalent statement is that, if A is a square matrix and 
Ax = 0 possesses a non-trivial solution, then A is singular. This 
result is not, of course, new to us as it is contained in Theorem 1.6.1 
(p. 27). 

Exercise 6.1.2. Let a^j (i,j = l,...,n) be given numbers. Show (say by 
considering components of vectors) that a necessary and sufficient condition 
ffir the existence of vectors X 2 ,...,x„ (of order n), not all zero and satisfying 
the equations „ 

^2 OyX^ = 0 (i = 1.n), 

is the vanishing of the determinant |ay|„. 



136 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.2 

5.2. The rank theorem 

In this section we shall establish a link between the notion of a 
determinant and that of linear dependence, and thereby obtain the 
preliminary results necessary for the solution of the main problem. 
The key idea in this context is that of rank, first discussed explicitly 
by Sylvester in 1851. 

Definition 6.2.1. (i) Let K he an mxn matrix. If k 
Z < 71, then any k rows and I columns of A determine a kxl sub- 
MATRix of A. (ii) The determinant of a kxk submatrix of A is 
called a Aj-rowed minor of A, or a minor of order k. 
Definition 5.2.2. (i) The rank (sometimes called determinant 
rank) -ff(A) of a non-zero matrix A is the maximum value of r for 
which there exists a non-vanishing r-rowed minor of A. (ii) A 
critical minor of a non-zero matrix is any non-vanishing minor of 
maximum order, (iii) The rank of any zero matrix is equal to zero. 

Thus, for A O, the statement iZ(A) = r means that (i) A 
contains at least one non-vanishing minor of order r \ (ii) A contains 
no non-vanishing minor of order greater than r. In order to show 
for a particular matrix A that (ii) is satisfied it is, of course, sufficient 
to verify that all minors of order r-\-l vanish. 

Exercise 5.2,1. Use Exercise 1.4.4 (p. 20) to prove this statement. 

It is obvious from Definition 5.2.2 (i) that, for any non-zero 
matrix A of type mxn, the rank cannot exceed m or ti, i.e. 

0 < JS(A) < min(m,7i). 

It is equally obvious that, for a square matrix A of order n, 
11(A) < nor B(A) = n according as A is singular or non-singular. 
To illustrate the notion of rank, consider the matrix 
/-I 0 2 1\ 

A=j 0 1 1 —Ij- 

\ 2 0 -4 -2/ 

This matrix possesses no minors of order greater than 3; and its 
four minors of order 3 all vanish, since 


—1 

0 

2 


—1 

0 

1 


-1 

2 

1 

0 

1 

1 

= 

0 

1 

—1 

== 

0 

1 

-1 

2 

0 

—4 


2 

0 

-2 


2 

—4 

— 2 


0 2 1 

1 1 -1 

0 —4 —2 


= 0 . 



THE RANK THEOREM 


137 


V, § 5.2 

On the other hand, A contains at least one non-vanishing 2-rowed 
minor, e.g. that situated in the top left-hand corner of A and 
indicated in bold-face type. Hence i?(A) = 2. The reader should 
have no difficulty in determining a number of critical minors of A 
in addition to the one just mentioned. 

Exercise 5.2.2. Show that if A' is a submatrix of A, then R(A') ^ i?(A). 

Exercise 5.2.3. Show that, for every non-zero scalar A, R{XA) — 2?(A). 

Exercise 5.2.4. Show that the rank of a diagonal matrix with n—k zero 
elements and k non-zero elements on the diagonal is equal to k. 

Exercise 5.2.5. Show that the rank of a matrix remains imchanged if the 
rows, or the columns, are permuted, or if the matrix is transposed. 

In what follows we shall have frequent occasion to speak of 
linear combinations and of linear dependence of the rows (or 
columns) of a matrix. This terminology introduces no new ideas 
and simply means that the rows or columns of the matrix are 
treated as vectors. 

Definition 5.2.3. (i) The row bank (column rank) of a 
matrix A 7^ O is the maximum value of rfor which there exist r linearly 
indejpendent rows {columns) of A. (ii) The row rank and column rank 
of any zero matrix are both equal to zero. 

In view of Exercise 2.3.5 (p. 51) it is clear that the row rank 
(column rank) of A is equal to the dimensionality of the vector 
space spanned by the rows (columns) of A. 

We need the next theorem in order to demonstrate that the rank, 
row rank, and column rank of a matrix are all equal. 

Theorem 5.2.1. Let i2(A) = r O 1). Then any r rows {columns) 
of A which contain a critical minor are linearly independent, and 
every row {column) of A may be expressed linearly in terms of these 
r rows {columns). 

It is, of course, sufficient to prove this theorem for rows. Let A 
be the mxn matrix (a^^) and assume, as may be done without loss of 
generality, that a critical minor A is situated in the top left-hand 
corner of A, i.e. 

• • • ®lr 

®rl • • • ®rr 


A = 


^ 0 . 


(5.2.1) 



138 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.2 

We then have to show that (i) the first r rows of A are linearly 
independent; (ii) every row of A is expressible linearly in terms of 
the first r rows. 

Suppose are numbers such that 


Equating to zero the first r components of the vector on the left, we 
obtain yi®n+—4-yr®ri = 0, 


yi®lr+ — +yr®iT = 0. 

Hence, by (6.2.1) and the corollary to Theorem 6.1.2 it follows that 
yi = ... = y, = 0. Thus the first r rows of A are, in fact, linearly 
independent. 

We shall show next that every row of A is expressible linearly 
in terms of the first r rows. When r — m there is nothing to prove* 
and we may therefore assume that r < m, r ^ n. Consider the 
determinant 

®11 . • . Oir Oij 


®rl • • • ®rr ®r^ 

®<1 • • • ®<r ®« 

where r+1 < i < m, 1 < j < n. If < r, then D possesses two 
identical columns and therefore vanishes. If j > r, then D is an 
(r +1 )-rowed minor of A and so vanishes in virtue of the assumption 
iZ(A) = r. Thus, in every case, D — 0. 

The cofactor of the element atj (t — 1 .r) in D depends on t 

and i (but not on j), and so may be denoted by A«. The cofactor of 
in D is obviously A. Expanding D in terms of the elements of 
the last column we therefore obtain 

a — D — ai^Ai^+*”4'®r^^+®y A. 

Hence, by (6.2.1), there exist numbers such that 

®<i =/*ii®U+“’+/*rt®ri (i = r+l,...,w; j = l,...,n). 

These relations may be written as * 

(®U.-.®i») = ;iii(®u.-.®in)+-+Mrt(®rt.-.®m) (» = »‘+l,....«»), 

and the proof is therefore complete. 






V,§6.2 THE RANK THEOREM 139 

Theorem 5.2.2. (Bank theorem) 

The rank, the row rank, and the column rank of a matrix are aU 
equal. 

Let the row rank and the column rank of a matrix A be denoted 
by Bi(A) and i? 2 (A) respectively. It is clearly sufficient to show 
that B{A) = Bi(A), for B(A) — B^iA) will then follow by an 
analogous argument. Alternatively, we may deduce the second 
relation from the first by observing that 

B{A) = B{AT) = Bi(A^) = i?2(A). 

The theorem is trivially true for A = O, and we may therefore 
assume that A is a non-zero matrix, say of type mxn. If 
^(A) = r 1), then, by Theorem 5.2.1, A possesses r linearly 
independent rows, say and every row of A is expressible 

as a linear combination of Denote by 93 the vector space 

(of order n) spanned by all rows of A. Since (in view of Exercise 
2.2.5, p. 48) 93 is also spanned by and since these vectors 

are linearly independent it follows, by Theorem 2.3.1 (p. 50), that 
they constitute a basis of 93. Hence (i(93) = r = B(A). But, as 
was pointed out immediately after Definition 5.2.3, i?i(A) = d(93), 
and so i2(A) = i?i(A). 

Corollary. The maximum number of linearly independent rows 
of a matrix is equal to the maximum number of its linearly independent 
columns. 

Alternatively, we may say that the rows and the columns of a 
matrix span vector spaces of the same dimensionality. It is worth 
noting that although this result involves only the notion of linear 
dependence, we have found it simplest to establish it indirectly by 
appealing to the theory of determinants.! 

Since rank, row rank, and column rank are now known to be 
equal, we need no longer to differentiate between them. When in 
future the term 'rank’ is used we shall bear in mind that it denotes 
a number which possesses all the three properties specified in 
Definitions 5.2.2 and 5.2.3. 

Exebcise 6.2.6. Show that a square matrix is singular if and only if its 
jows (or columns) are linearly dependent. 

t For proofs independent of determinant theory see Artin, Oaloia Theory 
{2nd edition), 7-9; Schreier and Sperner, 4, 117-19; or Perils, 6, 66. See also 
Hasse, 13, 68-103, for a systematic development of linear algebra without the use 
of determinants. 



140 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.2 

The following simple result will prove very useful in our sub¬ 
sequent discussion. 

Theorem 5.2.3. Let Kbe a given mxn matrix. The set of all 
vectors of the form Ax, where x is an arbitrary vector of order n, is a 
vector space of order m and dimensionality JB(A). 

Let ® denote the set of all vectors of the form Ax. In view of the 
linearity of matrix multiplication ® is obviously a vector space (of 
order m). By Theorem 3.3.6 (p. 84) we have 

Ax = a?! A„ei+—+^w'^*n> 

where x = Hence 58 is spanned by A^^,..., A^^, and 

so d(5B) is equal to the maximum number of linearly independent 
columns of A, i.e. d!(5B) = JB(A). 


5.3. The general theory of linear equations 

5.3.1. The results of the previous section furnish all the 
necessary preliminary material and enable us to construct a com¬ 
plete theory of linear equations.f 

Definition 5.3.1. In the system of linear equations 

Oil a:i+...+ai„ »„ = 6^ ] 

. (5.3.1) 

J 

the matrix 

^ml • • • ^mn 

is known as the matrix of coefficients, while 




is known as the augmented matrix. 


Theorem 5.3.1. (Consistency theorem) 

A necessary and sufficient condition for a system of linoar equations^ 
to be consistent is that the matrix of coefficients should have the same 
rank as the augmented matrix. 

t For an alternative treatment see Wade, 1, 147-55. 





V,§6.3 THE GENERAL THEORY OF LINEAR EQUATIONS 141* 
The system of equations (6.3.1) is, in view of Theorem 3.3.6, 
e,»iv*ntto = b, (6.3.2) 

where b = Moreover, the augmented matrix B has 

b as its columns. 

To prove necessity, suppose that there exists a solution 
of (5.3.2). Then b is a linear combination of the columns of A, and 
so B cannot have a greater number of linearly independent columns 
than A. Hence R(B) < jB(A). Furthermore, since (by Exercise 
6.2.2, p. 137) i?(A) < i2(B), we infer that -R(A) = i2(B). 
Sufficiency is established as follows. Suppose that 

B{A) = i?(B) = r > 1 

(the case r = 0 being trivial). Consider r columns of A which con¬ 
tain a critical minor, say A, and assume, without loss of generality, 
that A,^!,..., are such columns. By hypothesis, A is also a critical 
minor of B and therefore, by Theorem 5.2.1, b is a linear combina¬ 
tion of A,„i,...,A,„,., say 

It follows that (5.3.2) is satisfied by 

a;,. = x ^^^ = 0, ..., x ^ = 0;t 

and the system is therefore consistent. 


Since the rank of a matrix can be found by evaluating a number 
of determinants, it follows that the result just proved furnishes 
not merely a theoretical criterion but also a practical procedure for 
testing a system of linear equations for consistency. Consider, for 
example, the system of equations 

3a:+y—52 = —1, x-—2y-{-z = —5, x+^y^—lz = 2. (5.3.3) 

Here the matrix of coefficients is 



t For r « n we have simply = A,...,«« == 



142 LINEAB EQUATIONS AND BANK OF MATBICES V, § 5.3 

It can be verified at once that the determinant of A vanishes, and 
that therefore i2(A) < 3. Again, the augmented matrix is 



and the 3-rowed minor of B consisting of the first, second, and 
fourth columns has the value 49. Hence B(B) = 3, and so 
It{A) < The system (5.3.3) is therefore inconsistent. 

It should be noted that the method illustrated by this example 
has an obvious disadvantage. It depends on the evaluation of a 
number of determinants and so often involves a considerable 
amount of numerical work. This can be avoided by a more 
expeditious method of determining rank which will be described 
in the next chapter.f 

Theorem 6.3.1 does not, of course, exhaust the theory of linear 
equations, for though it enables us to carry out tests for consistency, 
it provides no method for fin din g solutions of a system of equations 
or for determining the nature of the totality of the solutions. 

Definition 6.3.2. Two systems of linear equations {in the same 
unknowns) are equivalent if they have the same solvMons. 

Exercise 5.3.1. Show that the systems 

x-\-y'\-z = 0, 2x-^y—2z = 1, and x-\-y'\-z = 0, Zx-^2y—z = 1 

are equivalent. 

Theorem 6.3.2. (Complete solution of a system of linear 
equations) 

Let <obea consistent system of linear equations in n unknoumSy and 
let Lbea critical r-rowed minor of the matrix of coefficients. Then 

(i) the r equations of S whose coefficients are involved in A form 
a system S' equivalent to S; 

(ii) if arbitrary values are assigned to the n—r 'disposable' 
unknowns in S', whose coefficients are not elements of A, then the 
remaining r unknowns are uniquely determined', 

(iii) by assigning all possible sets of arbitrary values to the disposable 
unhnoums in S' and determining in each case theremaininyunknoums, 
we obtain all solutions of the original system S. 


t See § 6.2.2. 



V,§6.3 THE GENERAL THEORY OF LINEAR EQUATIONS 143 

For r = n the theorem must be interpreted as meaning that 
there are no disposable unknowns, and that all unknowns are 
determined uniquely. 

Let (5.3.1) be the given system S and assume that 

• • • ®lr 

A =.7^ 0. 

®rl • • • ^rr 

The r equations whose coefficients are involved in A, namely, 

arta;i+...+a^*„ = b„ 

then constitute the system S'. Any solution of S is trivially a 
solution of S'. To prove the converse we may clearly assume that 
r < m. Since S is consistent by hypothesis, A is a critical minor of 
the augmented matrix; hence, by Theorem 5.2.1, every row of the 
augmented matrix is expressible linearly in terms of the first r rows. 
Thus there exist numbers {i = r+ 1 ,..., m;k = 1 ,..., r) such that 

r 

«« = 'l\kO'kj {* = »-+l,...,m; j = 1,...,»), 

r 

bi = ^\.kK (i = r+l,...,m). 

k~l 

Let be any solution of S'. Then, for r < i < w, 

n nr r 

2 2 2 \ k ^ ki ~ 2 \,kW 

-1 j-l k~l k -1 

— 2 2 

and so a?!,..., x^ is a solution of S. Hence S and S' are equivalent. 
The reason why we obtain a solution of the entire system by 
considering a solution of r selected equations only is that the other 
equations are, in fact, redundant and do not add anything to the 
information provided by the r selected ones. 

• We now consider the system S'. If r == n, then, since A 0, it 
follows by Cramer’s rule (Theorem 5.1.2) that all the unknowns are 
uniquely determined. If, on the other hand, r < /i, then the 
coefficients of a;,.+i,..., x^ are not elements of A. Assign any arbitrary 




144 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.3 

values to these unknowns (which may be called disposable 
unknowns) and rewrite S' in the form 

®l,r+l^r+l ••• 

+ ®f,r+l^r+l— 

The remaining t unknowns are now seen to be uniquely 

determined by Cramer’s rule. 

It remains to show that by giving all possible sets of values to 
^r+v—y^n in each case determining the corresponding (unique) 
values of iCi,..., we obtain all solutions of S. Let be any 

solution of S, and so of S'. In the procedure described above we 
may take = —Vn* We then obtain a solution 

yiy-yVry 2/r+i»->2/n- But, in view of Theorem 5.1.2 (p. 134), 
== andsoourprocedure does,indeed,yield i/i,..., 2 /^ 
as a solution of the system. The theorem is therefore established. 


From the proof of Theorem 5.3.2 and from Theorem 5.1.2 it is 

clear that the general solution of the system S has been obtained 

in the form ^ ^ ^ ^ \ ^ 

= JPl,r+l^r+l+ — +i^ln'^n+?l 


^r+1 ^ ^r+1 


(5.3.4) 


where parameters, i.e. numbers to which arbitrary 

values may be assigned. The scheme (5.3.4) is not symmetrical 
since in our discussion the position of a critical minor was chosen 
in such a way that x^ became disposable unknowns. How¬ 

ever, the following formulation is independent of the position of 
critical minors. 


Theorem 5.3.3. The general solution of a consistent system of 
linear equations in the unknowns is given by 

.. (5.3.5) 

—+i>rw^«+?n j 

Here the are constants;^ A^,..., are parameters; and s = n—r, 
where r is the rank of the matrix of coefficients. 

f These constajits are not, of course, uniquely determined since they depend on 
the choice of disposable unknowns. 







V,§6.3 THE GENERAL THEORY OF LINEAR EQUATIONS 146 

Thus all solutions depend linearly on n^r parameters. We note 
that (5.3.5) can be expressed in the form 

x = PX+q, 

where x = P is an matrix, q a vector of order n, 

and X a variable vector of order s. 


The information derived from Theorems 5.3.1 and 5.3.2 about 
the number of solutions of a system of linear equations may be 
summarized as follows. 


Theorem 5.3.4. Let A he the matrix of coefficients and B the 
augmented matrix of a system of linear equations in n unknowns. Then 
the system possesses an infinity of solutions if and only if 

jB(A) = i2(B) < n. 

It possesses a unique solution if and only if 

B{A) = R(By= n. 


It possesses no solution if and only if 


R(A) < R(B). 


Exercise 5.3.2. Illustrate the above result by reference to (i) the 
systems of equations mentioned immediately after Definition 5.1.2 (p. 132); 
(ii) a single equation px == q. 

Exercise 5.3.3. Suppose that a system of linear equations in n unknowns 
is consistent and that the rank of the matrix of coefficients is n. Show that 
the solution of the system is unique. 


Since the determination of solutions of systems of linear equa¬ 
tions involves only rational operations we obtain the following 
result which is implicit in the previous work. 

Theorem 5.3.5. If all the coefficients of a system of linear equations 
belong to a reference field 5 and if all the disposable unknowns (if any 
exist) are also restricted to values in g, then all solutions of the system 
are vectors over g. 


5.3.2. As an illustration of the procedure established by 

Theorem 5.3.2 let us consider the following system of equations in 

4 unknowns: „ « , o ' 

* 3a;— ly-{-l^z—%w = 24 

x—Ay+^z—w = —2 

y-\-z—w = 6 

2x—lby—z-\-6w= —46. 


(5.3.6) 


6582 



146 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.3 
The augmented matrix of this system is 


^3 

-7 

14 

—8 

24' 

1 

—4 

3 

-1 

—2 

0 

1 

1 

-1 

6 

.2 

— 16 

— 1 

6 

-46, 


It can be verified that all 3-rowed minors of this matrix vanish. 
There are, however, non-vanishing 2-rowed minors, and the figures 
in bold-face type indicate one such minor. It is, of course, critical 
both for the matrix of coefficients and for the augmented matrix. 
Thus the system (5.3.6) is consistent and we have, in fact, m == 4, 
n = 4, r = 2. We may now discard the first and fourth equations 
since none of their coefficients is an element of the critical minor we 
are considering. In the system consisting of the second and third 
equations, namely, 

x—iy-\-Zz—w = —2, y-\-z—w = 6, 

the coefficients of y and z are not elements of the critical minor. We 
therefore take y and z as the disposable unknowns, and giving them 
the arbitrary values A, y, respectively obtain 

X = 6A—2/a—8, w = A-|-/a—6. 

The general solution of (6.3.6) is therefore given by 

X = 6A—2/t—8, y = A, z = fi, w = A-f-/a—6, 

where A, y are parameters. In vector notation this may be written 

ast 

{x,y,z,w) = (6A—2/a—8, A, /a, A-b/a—6) 

= A(6,1,0, l)+y{-2, 0,1, l)-f (-8,0,0, -6). (6.3.7) 
Here (—8,0,0, —6) is a particular solution of (5.3.6) while 
A(5,l,0,l)-l-/a(-2.0,1,1) 

is the general solution of the associated homogeneous system. | 
The form in which the general solution of a system of linear 
equations is written out is not, as a rule, unique, since it depends on 
the initial choice of a critical minor. In the case of the system 
(6.3.6) just considered we might, for instance, discard the first and 


t We use row vectors for typographical convenience. 

X Compare Theorem 6.1.1 and also Theorem 6.4.2 below. 



V,§6.3 THE GENERAL THEORY OF LINEAR EQUATIONS 147 

second equations and consider the third and fourth only, since these 
contain the critical minor 

1 1 . 

— 15 —1 

In that case x and w became our disposable unknowns, and we 
obtain }x+p^+f, z = _|a;+|w4-¥. 

Putting X = Ipy w 7(7, where p, a are arbitrary parameters, we 
are led to the general solution of (5.3.6) in the form 

= p{7,l,^l,0)+a(0,2,5,7)+(0,f,f,0^ (5.3.8) 

This formula must, of course, yield the same set of solutions as 
(5.3.7), and we may also verify this directly by putting 

A = p+2a+fy PL = -p+5(7+-f 

in (5.3.7), or 

p == lA—fp—I, <j = |A+|p—f 

in (5.3.8). 

Exebcise 5.3.4. Obtain the general solution of the system (5.3.6) by 
discarding the second and third equations, and taking z, w as the disposable 
unknowns. 

The solution of a system of linear equations by the method just 
illustrated can result in a great deal of tedious work since it involves 
the evaluation of numerous determinants. A better practical 
procedure is to treat the given set of equations according to the rules 
familiar from elementary mathematics, i.e. by combining equations 
linearly and obtaining a simpler system equivalent to the original 
one. The justification of this rule lies in the fact that if some 
multiple of an equation is added to another equation of the system, 
the resulting system is equivalent to the original system. It follows, 
therefore, in particular, that redundant equations can be discarded. 

In the'case of the system (5.3.6) we may, for instance, proceed as 
follows. Subtracting the third equation from the second we obtain 
(5.3.6) in the equivalent form 

3x—72/+lfe—8t(; == 24, 
a;—6t/+2z = —8, 
y+z—w = 6, 

2a;-~15y—= —46. 



148 LINEAR EQUATIONS AND RANK OF MATRICES V,§5.3 
Next, we add 6 times the third equation to the fourth, and obtain 
3x—7y-i-14z—8w = 24, 
x—5y+2z == —8, 
y+z—w = 6, 

2a;--10t^+42 = —16. 

Here the fourth equation is simply equal to twice the second. It 
may therefore be discarded, and the system is equivalent to 

3x—ly+l4zSw = 24, 
x-^&y-\-2z = —8, 
y+z—w == 6. 

Subtracting the second equation from the first and multiplying the 
result by \ we have 

a;—J/+62—4w = 16, 
rr—= —*8, 
y-\-z—w = 6. 

Here the second equation is equal to the difference between the 
first and 4 times the third. Hence it may be discarded, and we are 
consequently left with the system 

x—y-\-ez—4w = 16, y-^-z—w = 6. 

We may take y, z as the disposable unknowns; and putting y = A, 
z = fi, where A, are parameters, we are at once led to the general 
solution in the form (6.3.7). 

In § 6.3.2 we shall see how the method for solving systems of 
linear equations illustrated above can be used with even less labour. 

5.4. Systems of homogeneous linear equations 
5.4.1. All conclusions of §5.3.1 are, of course, valid for the 
special case of homogeneous systems. There are, however, certain 
results that relate to homogeneous systems only, and we propose 
next to deal with these results. 

The question as to the existence of non-trivial solutions is settled 
at once by the previous theory. 

Theorem 6.4.1. The homogeneous system Ax = (Sinn unknoums 
possesses a non-trivial solution if and only if J2(A) < n. 

The matrix of coefficients and the augmented matrix of a homo¬ 
geneous system have obviously the same rank. Hence, by Theorem 



V,§6.4 SYSTEMS OF HOMOGENEOUS LINEAR EQUATIONS 149 

5.3.4, the system possesses an inSnity of solutions, and hence at 
least one non-trivia! solution, if and only if i?(A) < n. 

Corollary 1. A homogeneous system in which the number of 
equations is eqmil to the number of unknowns, possesses a non4rivial 
solution if and only if the matrix of coefficients is singular. 

This result is, of course, identical with Theorem 1.6.1. 

Corollary 2. If a homogeneous system has fewer equations than 
unknowns, then it possesses a non-trivial solution. 

The next result furnishes us with important new information 
concerning the totality of solutions of a homogeneous system. 


Theorem 5.4.2. (Dimensionality theorem for homogeneous 
systems) 

The solutions of the homogeneous system 

Ax = 0 • (6.4.1) 

in n unknowns constitute a vector space of dimensionality n—R(A). 


Let 93 be the set of all solutions of (5.4.1). If x, y e 93, then 
clearly otx, x+y e 93, and so 93 is a vector space. We write i2(A) = r 
and have now to prove 

d(3i) = n-r. (5.4.2) 


We shall give three proofs of this important result. 

(i) Write d(93) = k. It follows by Theorem 5.4.1 that k — 0 
implies r = n, while obviously k = n implies r = 0. We may 
therefore assume that 0 < k <n. 

Let {Xi,..., Xj^.} be a basis of 95. By the Corollary to Theorem 2.3.5 
(p. 64), there exist vectors X;i.+i,...,x^ such that 

{Xj,..., X^, X^^3^,..., x^} 

is a basis of 93^. The vectors Ax^,..., Ax^ therefore span the space 
93' of vectors of the form Ax (x e 93^). But Ax^ = ... = Axj^. = 0, 
and so the vectors 


Ax 




Ax„ 


(5.4.3) 


span 93'. Let the scalars be such that 

i-e» A(Q(fc+iXj.4.i+...+a„x„) = 0. 

This means that afc+iXfc+i+...+a„x„ e 93. But is a 

basis of 93, and so, for suitable scalars ai,...,*^. 


®*+iX*+i+...+a„X„ = aiXi+...4-a*X*. 



160 LINEAR EQUATIONS AND BANK OF MATRICES V,§6.4 


Since are linearly independent this implies, in particular, 

that _ _ _ 

The vectors (5.4.3) are therefore linearly independent and con¬ 
stitute a basis of 93'; hence d(93') = n—k. But, by Theorem 5.2.3 
(p. 140), d(93') = r, and (5.4.2) is therefore proved. 

Though in the proof just given we speak of the matrix A rather 
than of a linear transformation, the argument is, in fact, of the 
^invariant’ type; and of this the reader should have no difficulty 
in satisfying himself. The ‘invariant’ restatement of Theorem 5.4.2 
is as follows. Let L be a linear mappingf of 93,^ into 93„i. Denote 
by U the vector space of vectors in 93„i which are images of vectors 
in 93 , 1 , and by U' the vector space of vectors in 93,^ which map into 
the zero vector of 93„i. Then d{XV) = n—d(U). 


(ii) The next proof depends on the general theory of § 5.3.1. If r — 0, 
then clearly 93 = 93n if r = n, then 93 is the null space. In these two 
cases (5.4.2) is therefore valid and wo may now assume that 0 < r < n. 

In view of the discussion preceding Theorem 5.3.3 (p. 144), wo know that 
the general solution of (5.4.1) is given by 

X = PX, (5.4.4) 


where X is an arbitrary vector of order n—r and P is an n x (n—r) matrix of 
the form 



• • • Pin 

Pr^r+l 

• • • Prn 

1 

0 

... 0 

0 

1 

... 0 

.0 

0 

... 1, 


Since P contains a non-vanishing (n—r)-rowed minor (namely, that con¬ 
sisting of its last n—r rows), it follows that i?(P) = n—r. But, by (5.4.4) 
and Theorem 5.2.3, d(93) = i?(P)* Hence (5.4.2) is valid. 

(iii) Finally, we give a short proof depending on orthogonal complements. 
It is obvious that the vector x satisfies the equation Ax = 0 if and only if it 
is orthogonal to every column of,the matrix A^.f This implies that x 6 93 
if and only if x is orthogonal to every vector of the space 90B generated by 
the columns of A^. Thus 93 and 9P3 are orthogonal complements and so, by 
Theorem 2.5.7 (iii) (p. 68), d(93)H-d(9B) = n. But d(9®) = i?(A^) = r, 
and (5.4.2) follows. 

Exercise 5.4.1. Let A be the matrix of coefficients of a consistent 
system of linear equations in n unknowns. Show that there exists a vector 
space 93, of order n and dimensionality n—R{A), such that the general 

t L corresponds, of course, to A. 
t If A {a^j), then A denotes the matrix (a^^). 







V,§6.4 SYSTEMS OF HOMOGENEOUS LINEAR EQUATIONS 151 

solution of the system is of the form X 04 - where Xq is any particular solu¬ 
tion and 5 is an arbitrary vector in 95. 

5.4.2. Definition 5.4.1. A set of solutions x^,..., of a 
homogeneous system is a fundamental set of solutions if (i) 
Xi,...,X 4 . are linearly independent^ and (ii) every solution of the 
system is expressible as a linear combination o/Xi,...,X;i.. 

In fact, a set of solutions is a fundamental set if and only if it is a 
basis of the vector space of all solutions. By Theorem 5.4.2 it 
follows that, if i?(A) == r, then any n—r linearly independent 
solutions of the system Ax = 0 in rt unknowns form a fundamental 
set. 

It is easy to see how a fundamental set may be constructed. Let 
us suppose, to fix our ideas, that are taken as the dis¬ 

posable unknowns. Then the n—r uniquely defined solutions 
corresponding to 


Xi 

= 1 . 

to 

II 

p 

II 

1 

0 . 

Xy_ 

II 

o 

II 

II 

0 . 

*1 

II . 
© 

w 

p • 

II 

1 

. 

• 

1 . 


constitute a fundamental set. 

Exercise 5.4.2. Prove the statement just made, and use it aa a basis for 
a proof of Theorem 6.4.2. 

Consider, by way of illustration, the homogeneous system 

= 0, \%x+y—z^bw = 0, 7a;+2y+3z = 0. 

(5.4.5) 

The matrix of coefficients is 


\ 7 2 3 0/ 

This has rank 2 and a critical minor is indicated in bold-face type. 
Discarding, as we may, the second equation, and treating a;, y as 
the disposable unknowns, we rewrite (5.4.5) in the equivalent form 

z-\-w = —6a;—y, 32 = —7a;—2t/. 

If a; = 1, 2 / = 0, then z = —J, w == —if a; = 0, y = 1, then 
z = —I, w = — J. Hence the vectors 




162 LINEAR EQUATIONS AND RANK OF MATRICES V, §6.4 

constitute a fundamental set of solutions of (6.4.6). The general 
solution may therefore be written in the form 

(x,y,z,w) = A(3,0,—7,—ll)+/i(0,3,—2,-1), 


where A, /x are parameters. 

The importance of fundamental sets of solutions derives, of 
course, from the fact that any solution can be represented as a 
unique linear combination of the solutions of a fundamental set. 
We recall that an analogous result is valid for the linear differential 
equation 


•X 


d^x 


. . A 

+ + — 0 , 


any solution of which can be represented as a unique linear com¬ 
bination of n fixed linearly independent solutions. 


5.5. Miscellaneous applications 

5.5.1. The notion of linear dependence of vectors was intro¬ 
duced in § 2.3.1, but at that stage we had no means of testing effec¬ 
tively whether the vectors of a given set were linearly dependent 
or not. We are now able to deduce simple criteria for deciding these 
questions. 

By a matrix of a set of vectors (all of the same order) we shall 
mean any matrix having these vectors as its rows (or columns).! 

Theorem 6.5.1. Let K be a matrix of m vectors of order n. A 
necessary and sufficient condition for these vectors to be linearly 
dependent is that R{A) < m. 

Let Xi,...,x„j be the given vectors and let A be the matrix of 
which they are the rows. Denoting by f/, the maximum number of 
linearly independent vectors among Xi,...,x,„, we have /x = i?(A). 
Now Xi,...,x,^ are obviously linearly independent or linearly 
dependent according as /x = m or /x < m, and the assertion therefore 
follows. 

Theorem 5.5.1 states, in particular, that if n < m, then any m 
vectors of order n are linearly dependent—^a result with which we 
are, of course, already familiar. 

If m = n, the condition R{A) < m means that |A| = 0. WS 
therefore have the following consequence of Theorem 6.5.1. 

t We speak here of *a matrix’ rather than ‘the matrix’, since the order in 
which the vectors are taken is not laid down. 



MISCELLANEOUS APPLICATIONS 


153 


V, § 5.5 

Corollary. A necessary and sufficient condition for n vectors of 
order n to be linearly dependent is that a determinant formed by their 
components should vanish. 

A problem closely allied to the problem of linear dependence of 
vectors is concerned with the linear dependence of linear forms. 

Definition 5.5.1. A linear form L in the variables is 

a polynomial of the type 

L = L(Xt^,...,x^) = a^Xj^+...+a^x^. 

Exercise 5.6.1. Show that, with respect to obvious definitions of 
multiplication by scalars and addition, the set of linear forms in n variables 
is a linear manifold of dimensionality n. 

It is easy to see that linear forms can be used to ‘represent’ linear 
transformations of a linear manifold into its reference field. We 
leave it to the reader to supply the details. 

Definition 5.5.2. The linear forrhs in the variables 

linearly dependent if there exist numbers 
identically in Xi,...,x^y 

= 0 . 

Theorem 5.5.2. A necessary and sufficient condition for m linear 
forms in n variables to be linearly dependent is that It(A) < m, where 
A is a matrix of the linear forms.^ 

Let [i == l,...,m) 

be the given linear forms. It is seen at once that they are linearly 
dependent if and only if the m vectors ^ are 

linearly dependent. The assertion now follows as an immediate 
consequence of Theorem 5.5.1. 

It should be observed that Theorems 5.5.1 and 5.5.2 are not based 
on the detailed theory of linear equations and could have been 
proved immediately after § 5,2. 

5.5.2. The question concerning the characterization of divisors 
of zero in matrix algebra, which was first raised in § 3.6.2, can be 
settled now without difficulty. 

• Theorem 5.5.3. A (non-zero) matrix A of typemxn is a divisor 
of zero if and only if B(A) < max(m, n). 

t By a matrix of a set of linear forms (in the same variables) we mean, of course, 
any matrix formed by the array of their coefficients. 



164 LINEAR EQUATIONS AND RANK OP MATRICES V, § 6.6 

This statement includes Theorem 3 . 6.8 (p. 96) and shows further 
that a (non-zero) non-square matrix is necessarily a divisor of zero. 

We recall that a matrix A O is called a divisor of zero if and 
only if there exists a matrix X 7 ^= O such that AX = O or a matrix 
Y ^ O such that YA = O. In the former case X possesses a non¬ 
zero column, say X*^ = x. Then (AX),^^ = 0, and so, in view of 
Theorem 3 . 3.6 (ii) (p. 84), 

Ax = 0, X ^ 0. (6.6.1) 

In the latter case Y possesses a non-zero row, say Y^,,, = y^. Then 
(YA)i* = 0, and so, in view ofTheorems 3.3.5 (i) and 3.6.9 (p. 97), 
A^y = 0 , y 9 ^ 0 . ( 6 . 6 . 2 ) 

It follows that A is a divisor of zero if and only if there exists a 
vector X satisfying (5.6.1) or a vector y satisfying (6.6.2). But, by 
Theorem 6.4.1, a vector x of the required type exists if and only if 
B{A) < n, and a vector y of the required type exists if and only 
if i?(A) = i?(A^) < m. Hence A is a divisor of zero if and only if 
i?(A) < max(w, 7 i). 

5.5.3. We shall next give an interesting proof that if n (complex) 
non-zero vectors of order n form an orthogonal set, then they are 
linearly independent—a result which is a special case of Theorem 
2.6.4 (p. 66 ). 

Let the vectors in question be denoted by 

= (a:<i,...,a:<„) (i == !,...,«), 

and consider the n-rowed determinant D = Then, multi¬ 

plying D and D rows by rows, we obtain 

^11 • • • ^In *11 • • • ®ln 

1I>1® =. . 

*nl • ' • *nn *nl • • • *»n 

(Xi,Xi) . . . (Xi,X„) 

(X„,Xi) . . . (x„,x„) 

(Xi.Xi) 0 ... 0 

^ 0 (X2,Xa) ... 0 

0 0 . . . (x,»,x,») 

= (Xi.Xi)(X 8 ,X 2 ) ... (x„,x,»). 





V,§6.6 MISCELLANEOUS APPLICATIONS 155 

Hence D ^ 0 and so, by the corollary to Theorem 5.6.1, Xi,...,x^ 
are linearly independent. 

Determinants such as |(x^, x^) occur frequently in algebra, and 
are often referred to as Gram determinants, after J. P. Gram 
(185(>-1916). 

Definition 6.5.3. Let K he a rectangular matrix. Then the 
(square) matrix G = A^A is knoum as the gram matrix of A, and 
|G| the gram determinant of A. 

Our procedure in the argument above consisted essentially in 
constructing the Gram matrix of the matrix having Xi,...,x^ as 
its columns. 

Exercise 5.5.2. Let A be the mxn matrix having the vectors x^,..., x,j, 
of order m, as its columns and let G bo the Gram matrix of A. Show that 

Gij = (Xi, Xj) (t,y = l,...,n). 

There is a simple connexion between the rank of any matrix and 
the rank of its Gram matrix. 

Theorem 5.6.4. IfGis the Gram matrix of A, then R(G) = jR{A). 

Let X be a vector such that A^Ax = 0. Then x^A^Ax = 0, 
and so (Ax)^Ax = 0, i.e. \Ax\^ = 0. We therefore have Ax = 0. 
Conversely, if x satisfies the relation Ax = 0, then clearly 
A^Ax = 0. It follows that the homogeneous systems Ax = 0 
and Gx = 0 are equivalent. Hence, by Theorem 5.4.2, 

n-i?(A) = n-~i?(G), 

where n is the number of columns of A. The theorem is therefore 
proved. 

5.5.4. The theory of homogeneous linear equations enables us 
to give an alternative proof of the theorem on orthonormal bases 
(Theorem 2.5.5, p. 66). This proof is based on Theorem 6.4.1 
(p. 148) and makes no use of an explicit construction such as that 
furnished by Schmidt’s orthogonalization process. 

Let Xj,..., Xjf, be an orthonormai set in a vector space 5B of dimen¬ 
sionality r. We have to show that this set may be augmented in 
such a way as to become an orthonormal basis of 93. We may 
assume that 1 < < r. 

By the corollary to Theorem 2.3.5 (p. 54) there exist vectors 
Yaj+d—> Yr ^ ® such that Yk+v—y Yr] is a basis of 93. Now 

it is possible to choose a non-zero vector Xj^^i e 93 such that 

(Xi.Xfc+i) = ••• = (Xfc.Xfc+i) = 0- (6.6.3) 



166 LINEAB EQUATIONS AND BANK OF MATBICES V, § 6.6 
For, if Xfc+l =: aiXi4- —+“*XA;+“A!+iyfc+l+”‘+“ryr» 
then (5.5.3) is equivalent to the system of equations 

«l(Xl.Xl)+-+“fc(Xl.X*)+a*+l{Xl.yfc+l) + -+«r(Xl.yr) = <>. 


«i(Xife,Xi)+...+afc(x*,x*)+afc+i(Xfc,y*+i)+...+a,(X;fc,y,) = 0, 

and, since k <r, there exist, by Corollary 2 to Theorem 5.4.1, 
values of a^,..., ocj., not all zero, satisfpng these equations. 

Since 0, it may be normalized; and it follows that if 

Xi,...,X;j. is an orthonormal set in ® and k < d(95), then there exists 
a vector e ® such that the augmented set Xi,...,x^,x^+i is 
again orthonormal. The proof of the theorem is now completed by 
a repeated application of this result. 

5.5.5, An interesting application of matrix technique can be 
made in the calculus of observations. Consider the system of 
linear equations 

^l + —+ 1 

. ( 5 . 5 . 4 ) 

in which m> n, and suppose that the coefficients (which 

have real values) have been determined as the result of experiments 
and that, owing to errors of observation, the system (5.5.4) is, in 
fact, inconsistent. There exist, then, no values of iCi,..., satisfy¬ 
ing (5.5.4) and we are consequently faced with the problem of 
determining the set of values satisfying the system with the ‘least 
degree of inaccuracy ’. The principle of least squares states that such 
a set of values makes the expression 


(aiia:i+...+Oi„a;„-6i)2+...+(o„ia;i+...+a„„a;„-6J2 

(5.5.5) 

a minimum.'j' Taking this as our starting-point we can easily deter¬ 
mine the appropriate values of a;i,...,a;„. We rewrite (5.5.4) in the 

t It will have been noted that whereets the system (5.5.4) remains imaffected if 
its equations are multiplied by any non-zero constants, the expression (5.5.5) is 
certainly changed as the result of such an operation; and so are therefore the 
values of a?!,..., which are obtained by the application of the principle of least 
squares. The determination of the appropriate multiples by which the equations 
are to be multiplied is therefore vital; but since it depends on the ‘weighting* of 
observations and is not relevant in the present context we shall ignore it. The 
reader who wishes to pursue this topic further may consult Whittaker and 
Brobinson, The Calculua of Observations (2nd edition), chap. ix. 





V,SS.6 MISCELLANEOUS APPLICATIONS 167 

familiar form Ax = b and put 8 = Ax— b. The principle of least 
squares just enunciated requires that the expression 

8^8 = x^’A^’Ax-x^^A^’b-b^’Ax+b^^b 


should be a minimum, so that 

d 


dx 


(8J’8) = 0, 


(5.6.6) 


where the differential operator d/dx is defined by the formula 

d ,, , ld<l> d<f>Y 

.*"> “ .a£) • 

It is not difficult to verify that 


^(8^8) = 2A^Ax-2A2’b, 
ox 


and so (5.5.6) can be written as A^Ax = A^b. Hence, if A^A is 
non-singular, our problem has the unique solution 


X = (A^A)-iA^b. (5.5.7) 

We note that the nxn matrix A^A is non-singular if and only if 
fZ(A^A) = n and, in view of Theorem 5.5.4, this is equivalent to 
the requirement that i2(A) = n. This means that among the 
equations (5.5.4) there is at least one set of n linearly independent 
equations. 

As an example consider the determination of a straight line by 
means of the measurement of the position of fc (^ 3) points on that 
line. Let the coordinates of these points, as observed, be 

(xi.t/i), ..., {Xk,yk), (5.6.8) 

and write the equation of the required line in the form y = mx+c. 
Then the unknowns m, c, should satisfy the equations 

= 0 , ..., mxj,+c—yj, = 0 . 

If these equations are inconsistent—as, in almost every practical 
case they are bound to be—^we have to determine the values of 
m, c which will make the line y = mx-f c pass ‘as nearly as possible’ 
through the points (5.5.8). The formula (5.5.7) shows that these 



168 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.6 
values are given by 

{m,cr = (A^A)-iA%i. 


where 

Hence 


A = 


/*! 1 \ 


W 1 


/*?+. 

.+4 


7 * 1 ^ 1 +- 



■•+** 

* / 

1 2 / 1 +-. 

■■+yk 1 


and it is easy to show that the first matrix on the right-hand side 
is non-singular except when x^ = ... == The reader who is 
familiar with statistical terminology will have now no difficulty in 
verifying that the gradient m of the required line is given by 


m 



where Oy are the standard deviations of the x'b and j/’s respec¬ 
tively, and r is the coefficient of correlation between these two sets 
of numbers. 


Exebcise 5.6.3. Determine the parabola y — a-{-hx-\-cx^ which passes 
‘as nearly as possible’ through the points {x^, y^), i = 1,..., k, where A; > 4. 


5.6. Further theorems on rank of matrices 

Since the idea of rank is of fundamental importance in linear 
algebra we proceed now to derive a number of results involving the 
ranks of sums and products of matrices. 

Theorem 5.6.1. //A and B are matrices (of the same type), then 

JB(A+B) < -B(A)+i2{B). 

Write B(A) = r, i2(B) = s. DenotebyXi,...,x,.asetofrlinearly 
independent columns of A and by yj,.,., a set of s linearly inde¬ 
pendent columns of B. Then every column of A+B is expressible 
as a linear combination of the r+s vectors 

( 6 . 6 . 1 ) 

Let U denote the vector space spanned by the columns of A+B. 
Then U is also spanned by the vectors (6.6.1), and we have 

i2(A+B) = d(U) < r+a = 2?(A)+1?(B). 

Corollary. jB(A—B) ^ |5(A)—jB(B)|. 



V,§6.6 FURTHER THEOKEMS ON RANK OF MATRICES 159 

This inequality follows at once from Theorem 6.6.1 if we replace 
A by A—B in that result and also make use of the fact that 
B{-C) = J?(C). 

Exercise 6.6.1. Show that Theorem 6.6.1 is beat possible in the sense 
that the sign * < ’ cannot be replaced by ‘ 

Theorem 5.6.2. The rank of a product of two matrices is not 
greater than the rank of either factor, i.e. if AB exists, then 

JS(AB) < min{-B(A), i?(B)}. 

We shall give three proofs of this result. 

(i) Let U be the vector space of vectors x such that Bx = 0, and 
let 95 be the vector space of vectors x such that ABx = 0. If 
X e U, thenx e 95, and so U c 95. Hence, by Theorem 2.3.4 (p. 53), 
d{U) < d(95) and hence, by the dimensionality theorem (Theorem 

where n is the number of columns of B. Hence 12(AB) ^ 
and from this we also infer that 

J?(AB) = jB(B^A^) < R{A^) = JK(A). 

The assertion therefore follows. 

(ii) Let i?(A) == r and let n now denote the number of rows of A. Suppose 

that are numbers such that 

A„^ == 0. (6.6.2) 

Then, by Theorem 3.3.6 (i) (p. 84), 

Ai(AB)i*+...+A„(AB)„* = 0. (6.6.3) 

If r < n, then any r-f-1 rows of A are connected by a linear relation in which 
not all the coefficients are zero; and, in view of (5.6.2) and (5.6.3), it follows 
that the same is true of the rows of AB. Hence /2(AB) < r == R(A), and 
when r ~ n this inequality holds trivially. The proof is now completed in 
the same way as in (i). 

(iii) A longer but more direct proof is as follows. Suppose, in the first 
place, that A, B (and therefore AB) are square matrices of order k, and 
write A = (a^,), B = (6^^). The case when i?(AB) == 0 is trivial and we 
shall assume that R{AB) = r, where 1 < r < fc. Then at least one r-rowed 
minor of AB (say A) does not vanish, and there is no loss of generality in 
supposing that A lies in the top left-hand corner of AB. Thus 

®u6ii + ...+Uijfc6;5.1 . . . I 


— + + —+ »r)b^Jfcr 


A = 




160 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.6 
and so, by the corollary to Theorem 1.2.6 (p. 11), we obtain 

• • • ^Ivr^vrT 



^rvr ^vr r 


k 


^Ivi 

• ®ii» 

^rvi 

• • «rw 


But A 7 «^: 0 and therefore, for some values of chosen from 1,2,..., A;, 

we must have 

• • • ®lvr I 


#0. 


*TVx • • • “rjy I 

Hence A possesses at least one non-vanishing r-rowed minor, and so 

i?(AB) < i?(A). (5.6.4) 

Next, let A, B be of type Ixm, mxn respectively, write C == AB, and 
put k = max(i,m,n)-l-1. To each of the three matrices A,B,G we add a 
suitable number of zero rows and zero columns so as to convert them all into 
square matrices of order k. The new matrices A', B', C' thus obtained can 
be expressed most conveniently in partitioned form: 


lojL, “ - \0%.^ 0%zl)' \Ol., 0|^z?r 

Since AB = C, it follows easily by Theorem 3.8.1 (p. 104), that A'B' = O'. 
Hence, by (6.6.4), i2(C') < Ji(A'), But clearly 

li(A') == i2(A), i2(B') = i?(B), B(C') = JR(AB); 
and therefore (5.6.4) continues to hold in the general case. 


The result just proved establishes an inequality between the 
ranks of two matrices and the rank of their product. In one 
important case this inequality may be sharpened to an equality. 

Theorem 5.6.3. The rank of a matrix remains unchanged if the 
matrix is premultiplied or postmultiplied by a non-singular square 
matrix. 


Let A be an m X n matrix and let X, Y be non-singular square 
matrices of order m, n respectively. Then, by Theorem 5.6.2, 
i2(XA) ^ -B(A) and also 

jB(A) = iZ(X-i.XA) < 12{XA). 

Hence i?(XA) = iZ(A). This implies 

i2(AY) = i?(Y^A^) = R(\^) == iJ(A). 

Corollary. Rank is invariant under similarity transformations, 




V, FURTHER THEOREMS ON RANK OP MATRICES 161 

In view of Theorem 4.2.2 (p. 116), the interpretation of Theorem 
5.6.3 in terms of linear transformations is clear. It shows, in fact, 
that the rank of a matrix is a number which is characteristic not of 
that matrix only but of the entire class of matrices representing 
the same linear transformation. The corollary expresses the same 
idea for the special case of square matrices and linear transforma¬ 
tions of a linear manifold into itself, f 

Theorem 5.6.2 gives an upper bound for the rank of the product 
of two matrices. To obtain a lower bound we first need a preliminary 
result. 

Theorem 5.6.4. The vectors of the form Bx, subject to the condition 
ABx = 0, constitute a vector space of dimensionality J?(B)—i2(AB). 

We note that when B = I, this reduces to the dimensionality 
theorem (Theorem 5.4.2, p. 149). However, the argument below 
does not contain a new proof of Theorem 5.4.2 since it depends 
itself on that theorem. 

Let S be the set of all Bx such that ABx = 0. It is at once 
obvious that 93 is a vector space. Let A, B be of type mxn^nxp 
respectively and write j)—iJ(B) = g, p—-7?(AB) = r. Then, by 
Theorem 5.6.2, 0 < g < r. We shall, in the first place, assume that 
0 < g < r. 

By Theorem 5.4.2, the vector space of vectors x satisfying 
Bx = 0 is g-dimensional, and so has a basis {Xi,...,xJ, say. Again, 
the vector space of vectors x satisfying ABx = 0 is r-dimensional 
and, in view of the corollary to Theorem 2.3.5 (p. 54), there exist 
vectors Xg.^i,...,x,. such that {Xi,...,x^, Xg^i,...,x,.} is a basis of this 
vector space. The r—q vectors Bx^^.i,..., Bx,. are linearly indepen¬ 
dent; for if 

0 = ^gf+i • ...-f-a,.. Bx,. = B(ag^]LXg^j^-|-...-|-a,.X,.), 

then there exist scalars ai,...,a^ such that 

^(Z4-lXg+i + ...+«r^r = «iXi + ... + Q^gXg> 

and this implies that = ... = a,. = 0. Thus 93 possesses r—q 
linearly independent vectors Bx^^^i,..,, Bx,.. Suppose now that 
Bx G 93. Then ABx = 0, and so x can be writen in the form 

Hence Bx 


5582 


t Cf. Theorem 4.2.6 (i) (p. 119). 
M 



162 LINEAR EQUATIONS AND RANK OF MATRICES V, § 6.6 

and therefore BXg+i,...,Bx^ are generators of Thus, when 
0 < g < r, d(93) = r-g = R{B)-R(AB). 

The remaining cases can be disposed of without difficulty. If 
0 = g = r, then only the zero vector satisfies ABx = 0. Hence 93 
consists of 0 only, and d(95) = 0 = g—r. If 0 < g == r, then the 
g linearly independent solutions Xi,...,Xg of Bx = 0 form a basis 
of the space of vectors such that ABx = 0. Hence any such vector 
X may be written as 

X = j3iXi+...4-j8^Xg. 

This implies that Bx = 0, and hence 95 consists of 0 only, so that 
d{S8) = 0 = r—g. Finally, let 0 = g < r. In this case let {Xi,...,Xy} 
denote a basis of the space of vectors x such that ABx = 0. Then, 
arguing as above, we easily see that Bxi,...,BXy are linearly 
independent generators of 93. Hence cZ(95) = r = r—q. 

Theorem 5.6.5. If the product AB exists, then 

JS(AB) > i?(A)+i?(B)-n, 

where n is the number of columns in A {and of rows in B). 

Theorems 5.6.2 and 5.6.5 were found, for square matrices, by 
Sylvester in 1884 and are known jointly as Sylvester's law of nullity. 

The ‘nullity’, say v( A), oi^unxn matrix A is defined as n—i?(A). 
For square matrices Theorems 5.6.2 and 5.6.5 state that 

max{v(A), v(B)} ^ v{AB) v(A)-[-v(B). 

To prove Theorem 5.6.5 we do not require the full force of 
Theorem 5.6.4. For our purpose it is sufficient to know that there 
exist JS(B)—i?(AB) linearly independent vectors y = Bx satisfying 
Ay = 0. Hence, by Theorem 5.4.2, 

i?(B)-i?(AB) < n-R(A), 

and the assertion follows. 

Exekcise 5.C.2. Show that the sign in Theorem 5.6.6 cannot be 
replaced by ‘ 

A result proved by Frobenius in 1911 goes a little beyond 
Theorem 5.6.5. 

Theorem 5.6.6. If ABC exists, then 

i2(AB)+iZ(BC) < i2(B)+i2{ABC). 



V,§5.6 FURTHER THEOREMS ON RANK OF MATRICES 163 

This theorem contains the law of nullity, for when B = I it 
reduces to Theorem 5.6.5, while the two cases A = O and C = O 
lead at once to Theorem 5.6.2. 

Denote by U the vector space of all vectors BGx such that 
ABCx = 0, and by 93 the vector space of all vectors Bx such that 
ABx = 0. Clearly U c 93, and so d(U) < d(93). But, by Theorem 
5.6.4, 

d(U) = R(BC)-R(ABC), d(93) = R(B)-R{AB), 

and the required theorem follows at once. 

It should be noted that in the discussion of the last three theorems 
we have been employing, in effect, arguments of the ‘invariant’ 
type. 


PROBLEMS ON CHAPTER V 

1. Solve completely the following systems of equations. 


(i) 2x’—Zy-\~Qz = 3, 

4:X — y-\-z ~ 1, 

Zx—2y-\'^z = 4. 

(iii) tSx-{-2y—Zu—v = 11, 

6x^y-\-bz—u—2v — 2, 
x--2y-j-4z-i-u — v = —5. 

(v) l\x-\-Sy—2z-\-Sw = 0, 

2x-\-3y—z-{-2w “ 0, 
Ix—y-^-z—Sw = 0 , 
4x—\ly-{-6z~-\2w ~ 0. 

(vii) 4x—2y-\-bz — 0, 

x — y-\-z = 0, 

bx--y-{-lz — 0 , 

— Zx-\-y — 4z = 0. 


(uf x-\-y—2z-\-3w = 0 , 

x—2y-\-z—w — 0 , 
x-\~'7y—Sz-\-llw = 0 , 
x—5y-\-4z—5w 0. 

(iv) —x-^2y-\-z-\~^w — 6, 

7a;+7y—13s + 3ty — —24, 

Zx-\-y — 6z~w— — 12 . 

(vi) x-\-2y—z = 2, 

2x—^y-{-lz = — 1 , 
— x-\-y-\-^z “ 6 , 

5x-\-y — 2z — 0. 

(viii) Zx-\-2y~2z~w — 9, 

x—y-\-z-{-2w — 7, 
6a;—101/4-103-fl7«; = 47. 


2. Solve completely the system of equations 

(n—\)x^ == a:a4-a^3 + — 

(n—l)a-2 = Xi-\-x^-{-,„+Xn, 


(n-l)Xn = a;i4-a^2+-+^n-i- 

3. For what values of o, 6, c is the system of equations 

(a-h3)a:—2y4“3z = 4, 
^x+{a—Z)y-{‘^z — b, 

4a;—8y4-(o-f 14)s = c 

consistent ? 




164 LINEAK EQUATIONS AND RANK OF MATRICES V 


4. Discuss, for all values of a, the system of equations 

x-^y-^-z = 2 , 

2x+y—2z = 2, 
ax-hy-i-4z = 2. 

6. Discuss, for all values of a, the system of equations 
aa;+(3o-h4)y-j-2(a+l)« = 0, 
aa;4-(4a+2)2/+(a+4)2 = 0, 
2a;+(3a-f 4)2/4-3az = 0. 


6. Illustrate Theorem 6.4.2 by discussing, for all values of p, the system 

of equations (2p+l)*+(p+2)y+(3p+3)a - 0. 

(6p+l)a:+(3p+3)2/+(7p+6)z = 0, 

3px-i~3py-i-(4p-{-2)z = 0. 

7. Determine the conditions for the consistency of the system of equations 

x-{-y—2z = 0, 
ax-i-by+cz = 0, 
bx-i-cy-i~az = d, 

and obtain the complete solution in each consistent case. 

8. Determine the conditions for the consistency of the system of equations 

35+2/4-2^ = 
ax~\-hy-\-hz = a®, 
ax-\-cy-{-dz = a6, 

and obtain the complete solution in each consistent case. 

9. Determine the conditions for the consistency of the system of equations 


x-]-y■^rZ = 1 , 
ax-{-hy-\-cz = d, 
d^x-{-h^y-\-cH = 


and obtain the complete solution in each consistent case. 

10. Determine the conditions for the consistency of the system of equa- 

x+y+z = 1, 
ax-\-by-{-cz = d, 
a®a;46®y>f = d®. 


and obtain the complete solution in each consistent case. 

11. Let 9P1 be a linear manifold, of dimensionality n, over a reference field 
8f. Show that there is a biunique correspondence between the set of linear 
transformations of 901 into 3f, the set of vectors of order n over and tho set 
of linear forms in n variables with coefficients in 

12. Show, by an example, that linear dependence of the columns of a 
matrix does not imply tho linear dependence of the rows. 

13. Let A, B be matrices of type mxn, nxp respectively. Show that 
the inequality il(AB) > i?(A)4p—^ does not always hola. 

14. Let A be a square matrix. Show that the inequality 12(A) < 12(A*) + 1 
does not always hold. 

16. Show that, if the numbers ai,...,a,( are distinct, then the n vectors 
(l,a,.,aj,...,a?''^) (r = l,...,n) are linearly independent. 



V 


PROBLEMS ON CHAPTER V 


166 


16. The vectors are defined by the equations 

Xi = (i = l,...,m;y = 1.n). 

Show that, if w > 3, then are linearly dependent. 

17. Show that, given any system of m linear homogeneous equations in 

m+2 unknowns ajj,..., there are two indices i,j in the range 1,2,..., m+2 

such that the system possesses a solution in which and Xj have any pre¬ 
scribed values. 

18. Show that, for any given constants a^j and (i — 1,..., m;y = 1,..., n), 
exactly one of the two systems of equations 


n 


= bi 

(i = 1.. 


m 

^aijyt = 0 

(i = 1,.. 

.,n), 


is consistent. 

19. Show that, if A has n columns and B n rows and if AB = O, then 
i?(A)4-f2(B) < n. 

20. Show that, if the n X n matrix A satisfies the equation A} = A, then 
i?(A)-h/2(I-A) = n. 

21. Let A, B be rectangular matrices for which the product AB is defined. 
Show that i?(AB) — jR(B) if and only if ABx = 0 implies Bx = 0. 

22. Let A be a square matrix, p a positive integer, and x a non-zero vector 
such that A^x ^ 0, A^+^x = 0. Show that the vectors x. Ax,..., A^x are 
linearly independent. 

23. Show that, if jR(A) = i?(BA), then i?(AC) = R(BAC). 

24. Show that, for A; > 2, 

jR(Ai...A;t) > /?(Ai)4-... + 7^(Ajt)—n(A;—1), 

where A^,..., Ajj. are square matrices of order n. 

26. Show that, if A; is a positive integer such that i?(A*) = jR(A*+^), then 

i?(A^'+i) = i?(A^+2) ^ i?(Afc+3) = .... 

26. A sequence {a*} of real nmnbers is said to be convex if, for all k. 

Show that the sequence {i?(A*)} is convex. 

27. Let A be any n x n matrix. Show, using either No. 22 or No. 25, that 

1?(A’*) = JR(A»»+i) = i?(A"+2) = .... 

28. Let the matrices Pi,...,Pjfe, Qk commute in pairs, and suppose 

that R(^^) = JR(P,,Ok) Show that 

i2(Pi...Pjfe) = i2(Pi...P;tOi...Q;k). 

29. Let the matrices Ai,...,A;fc commute in pairs and suppose that 
= ^(AJ) {k = 1,..., A;). Show that, for any positive integers 

i?(Ai...A;k) = R(A^^..A|*). 



166 LINEAR EQUATIONS AND RANK OF MATRICES V 

30. Let 01,,,.,0^ be given numbers and let the linear forms Li,.,.,Ln be 
defined by 

= L]^(xi,,.,,x^) — Xi-\‘0j^X2-\-^x^-\-,,,-{■ (k ~ 

Show that the maximum number of linearly independent forms among 
Li,..,, is equal to the number of distinct numbers among 0^,„.,0^, 

31. Let be n linearly independent complex vectors of order n. 

Show that there exist n vectors Xi,...,x„ such that (X|,Uj ) = 3ij(l < i^j < n). 

32. Let A be an m X n matrix of rank r. Show that the general solution 
of the system of homogeneous equations Ax = 0 can be written in the form 
X = PX, where P is any fixed nx(n—r) matrix such that AP = O, 
jB(P) = n—r, and X is a variable vector of order n—r. 

33. Lot n > 1 and suppose that A is an nxn matrix. Show that (i) 
i2(A) = n if and only if R(A*) ~ n; (ii) i?(A) = n — 1 if and only if i2(A*) == 1; 
(iii) i?(A) < n—\ if and only if A* = O. 

34. Show that, if the matrix A is singular, then its adjugate can be 
expressed in the form A* = xy^, where x and y are column vectors. 

35. Let U be an r-dimensional subspace of whore 1 < r < ii. Show 
that an n x n matrix A satisfies the conditons 

Ax = 0 (for all x 6 U), Ax ^ 0 (for all x '§ U) 
if and only if it satisfies the conditions 

AXj = ... = AXj, — 0, -R(A) = n—r, 

where Xi,...,Xy are any r linearly independent vectors of U. 

n 

36. Suppose that the system of equations 2 = l,...,wi) 

possesses a unique solution. Show that this solution is given by 

m 

i=l 

where the cr’s depend only on the a’s. 

37. A square matrix A, of order n, is called a G-matrix if the transforma¬ 
tion y = Ax implies the identity y^Gy = x^Gx, where G is a given non¬ 
singular matrix. Prove that, if A is a G-matrix, then |A| = i 1. 

If A = (a^j) and B = (hij) are two G-matrices and if |A|-f |B| = 0, 
prove that the system of equations 

n 

2 = 0 (i = 1.n) 

possesses a non-trivial solution. 

38. A is a symmetric n x n matrix of rank n—\, and ? is a non-zero vector 
such that A§ = 0. Show that the relations = 0 and u^A*u — 0 imply 
each other. 

39. Prove that a ne essary and sufficient condition for the system of 

equations „ 

2 diiXi = bi (i = l,...,m) 
i-i 



V 


PROBLEMS ON CHAPTER V 


167 


to have a solution is that every solution of the system of equations 

m 

.2«ii2/i = 0 (i=l,...,n) 

1 

m 

should also satisfy 2 2/i = 

i-l 

Find for what values of r) the equations 

x-\-y-\~z = 1 , 

a;+22/-f 425 t;, 

have a solution; and solve them completely in each case. 

40. Let Aj < n and let Li,...,Lfc be linearly independent linear forms in 

Show that there exists a non-singular linear transformation of 
into which carries into respectively. 

41. A is a square matrix of order n and rank r; Xi,...,X;^; are n X 1 matrices 
and Ui,...,U;fc are 1 xn matrices. The matrix B is formed by bordering A 
with columns Xi,...,x^ and rows Uj,...,iiy 5 ., and completing the matrix with a 
block of zeros. Provo that, if A; < n—r, then |B| = 0. 

42. Show that the system of equations ^ctijXj ~ bi (^ = l,...,m) is 

« • ^ ^ 

consistent if and only if 

m n 

2 7H 2 0 (identically in . xj 

i=i y-i 
m 

implies 

A and B are rectangular matrices having the same number of columns. 
Show that a necc^ssary and sufTicient condition for the existence of a matrix 
G such that B -= CA is that Ax = 0 should imply Bx = 0. 



VI 


ELEMENTARY OPERATIONS AND THE 
CONCEPT OF EQUIVALENCE 

Though the results obtained below are interesting and afford 
additional insight into the structure of matrices, they will not— 
with very few exceptions—^be needed subsequently. The present 
chapter (apart from § 6.5) may therefore be omitted by the reader 
who is anxious to proceed at once to the more advanced theory of 
Part II. 

6.1. ^-operations and ^-matrices 

6.1.1. Definition 6.1.1. An elementary operation (or, 
more briefly^ an J5-operation) on a matrix is an operation of one 
of the following three types. 

(i) The interchange of two rows {or columns). 

(ii) The multiplication of a row (or column) by a non-zero scalar. 

(iii) The addition of a multiple of one row (or column) to another 
row (or column). 

We distinguish between row operations and column operations 
according as the ^^-operations in question apply to rows or to 
columns. 

Our aim is to study the effect of J5-operations upon matrices, and 
it is useful to note at the outset that if B is obtained from A by an 
jS?-operation, then A can be obtained from B by an ^/-operation. 

Exercise 6.1.1. Provo this statement and deduce that, if D is obtained 
from G by a chain of -^-operations, then the converse is also true, i.e. C 
can be obtained from D by a chain of E-operations.f 

Exercise 6.1.2. Show that the identical transformation, i.e. one which 
leaves the matrix unchanged, is an jE7-operation. 

We shall adopt the following notation for j^-operations. The 
interchange of the ith and jth. rows (columns) will be denoted by 
Si (C^<^Cj); the multiplication of the ith row (column) 

by a scalar A 0 by (C^i->AO<); and the addition 

of ft times the jth row (column) to the ith row (column) by 
B^ ->■ (C^ “> C^-^-jxCj). 

t The term * chain* is taken to mean ^finite chain*. A chain may, of course, 
consist of a single operation only. 



VI, § 6.1 JT-OPERATIONS AND N-MATRICES 169 

Theorem 6.1.1. The rank of a matrix is invariant (i.e. remains 
unchanged) under elementary operations. 

Operations of types (i) and (ii) obviously do not affect rank and 
we need therefore only consider those of type (iii). We may, without 
loss of generality, restrict our attention to row operations. 

Let, then, the matrix B be obtained as the result of an E-opera,- 
tion of type (iii) on the rows of A. Then each row of B is a linear 
combination of the rows of A. Hence the maximum number of 
linearly independent rows in B is not greater than the maximum 
number of linearly independent rows in A,t i.e. R(R) ^ B(A). 
But A may be obtained from B by means of an ^-operation; hence 
jB(A) < and so -/?(A) = jB(B). The proof is therefore 

complete.^ 

The notion of invariance which appears in Theorem 6.1.1, and which we 
have met earlier on a number of occasions, is a key concept in algebra. Before 
passing on to further results on i^’-operations wo shall, therefore, formulate 
this notion in general terms. 

Let 0 bo a set whoso elements wo denote by a:, y, z,.... Let Q be a trans¬ 
formation of 0 into itself, i.e, an operator operating on the elements of 0 in 
such a way that with each clement x e S is associated a unique element 
y — Q.X e S. Suppose, furthermore, that a function / is defined on 0 so 
that with each rr e 0 is associated a functional value f(x) (which need not, 
of course, bo cither a number or an element of 0). If, for every a; e 0, 

f(Sh;)=f{x), (6.1.1) 

then wo say that / is invariant under (or with resj?ect to) the operator fl. 

In Theorem 6.1.1 above, 0 is the set of matrices x, y,...;f(x) is the rank of 
the matrix x, and Cl is any jE7-operation. In the corollary to Theorem 6.6.3 
(p. 160), 0 is the set of square matrices,/(x) the rank of the matrix x, and 
O any similarity transformation. Again, the proposition that the value of a 
determinant remains unaltered when its rows and columns are interchanged, 
may also bo expressed in the same terms by taking 0 as the set of square 
matrices,/(x) as the determinant of the matrix x, and as the operation of 
transposition. 

It is easy to see why the notion of invariance is important. For lot O be 
a set of operators on 0 and suppose that (6.1.1) holds for every x e 0 and 
for every O eO; in other words, suppose that / is invariant under every 
operator of the sot O. If now the determination of/(xo), for some particular 
X(, 6 0, is difficult or tedious we may be able to find a suitable operator 
Qq € ^ such that the determination of / (flo ^o) carried out more easily. 

Since, however,/(X q) == /(f^o^o)» original difficulty will then have been 

t Cf. Problem II, 6. 

t For an alternative proof depending on consideration of minors, see Bdcher, 

11 . 66 . 



170 ELEMENTARY OPERATIONS AND EQUIVALENCE VI, §6.1 

overcome. One useful application of this procedure will be found, for example, 
at the end of § 6.2.2. 

6.1.2. We resume now our discussion of jEr-operations. The 
next step—an important one —is to represent such operations by 
means of matrix multiplication. This device will enable us to use 
matrix technique when dealing with ^^-operations. 

Definition 6.1.2. An elementary matrix (or, more briefly, 
j^-MATRix) is any matrix derived from a unit matrix by a single 
E’Operation. 

Thus, for instance, 



1 

0) 


0 

i 

fl 

0 


1 

0 

0 

, 0 

5 

0 . 

0 

1 

0 

\0 

0 

1) 

' \0 

0 

1/ ' 

13 

0 

1 / 


are the ^-matrices obtained from I 3 by means of the jE?-operations 
Cl G^, G 2 5 ( 72 , -®3 ^ 3 + 3 i?i respectively. 

It should be noted that any J?-matrix can be obtained equally 
well by a row operation or a column operation on For the opera¬ 
tions Bj and ^ Cj have the same effect on I; so have 

B^^XBi and and, finally, so have B^-> B^-\-fiBj and 

(7y “> 

Theorem 6.1.2. An elementary operation on the rows (columns) 
of an mxn matrix A is equivalent to premultiplication (post¬ 
multiplication) of A by the elementary matrix derived by the same 
operation from (I,,), 

Let (ji denote any ^/-operation on the rows, ^ any J^-operation on 
the columns; and write ^(X), ^(X) for the matrices obtained when 
X is operated on by <f), i/f respectively. The theorem then asserts that 

<f,{A) = ^(A) = A^(I„). 

The proof is almost immediate. For if X, Y are any matrices for 
which the product XY exists, then (XY)j,^ = Xf,jY; and con¬ 
sequently any .B-operation on the rows of X effects the same 
operation on the rows of XY. Thus we have 

^(XY) = <f>iX)Y. (6.1.2) 

Similarly, any B-operation on the columns of Y effects the same 
operation on the columns of XY, so that 

^(XY) = X0(Y). 


(6.1.3) 



VI, §6.1 ^-OPERATIONS AND ^-MATRICES 
As special cases of (6,1.2) and (6.1.3) we obtain 

A) = .^(I„)A; ^(A) = ./.(AIJ = A^(I„), 

and the theorem is therefore proved. 


Exercise 6.1.3. Let A be an m x n matrix and let E, E' be any E-matrices 
of order m, n respectively. Show that EA can be obtained by an E-opera- 
tion on the rows of A and AE' by an E-operation on the columns of A. 

To illustrate Theorem 6 . 1 . 2 , consider for instance the 3x4 
matrix , ^ ^ ^ V 

A=( 6 i 62 63 641 . 

\Ci C 2 C 3 C 4 / 

If </> is the operation jBg then 

( CL-^ 0/2 ^4\ / i’ ^ 

Cl C2 C3 C4J, <^(13) = jo 0 1 j; 

61 bo bo bj • \o 1 0/ 


\ 6 i h 

and we have, in fact. 


02 

a ^ 


\ 1^ 

0 


C2 

^3 

^4 

= 0 

0 

1 

h 

63 

bJ 

' \o 

1 

0/' 


Again, if ip is the operation C 4 -> C 4 — 2 Ci, then 


( «! 02 ^3 0^ — 20i 

bi 62 63 64 261 

c C2 C3 C4- 2ci, 

and we have, in fact, 


O 2 ttg o^—2oi 




-2bA = ib, 
- 2 c,/ \c. 


O-t On Oo Ox 


TOO — 2 ^ 
0 10 0 

0 0 1 0 

,0 0 0 I 


1 0 0 — 2 \ 


0 1 0 
0 0 1 
0 0 0 


Theorem 6.1.3. (i) Every elementary matrix is non-singular. 
(ii) The inverse ojon elementary matrix is again an elementary matrix. 

The first assertion is obvious by Theorem 6 . 1.1 since an -B-matrix 
is derived by an J 5 -operation from a non-singular matrix. 

To prove the second assertion let E denote a given JB-matrix, say 
of order n. Then can be derived from E by a row operation, i.e. 



172 ELEMENTARY OPERATIONS AND EQUIVALENCE VI, §6.1 

= ^(E). Writing E' = ^(I„) and using Theorem 6.1.2 we 
obtain ^ 

The inverse of E is therefore the J^-matrix E'. 

6.2. Equivalent matrices 

6.2.1. We begin this section by introducing a general notion 
which is useful in the investigation of algebraic problems. 

Let Qi, Qg be operators mapping a set S of elements into itself, 
so that if a? G ®, then QiO;, ^ 

Definition 6 .2.1. The pboduct figfii of the operators Qi and 
Qg {in that order) is defined by the relation 

(QgQi)a: = (x e S). ( 6 . 2 . 1 ) 

In other words, (Qgf2i)a: denotes the element obtained when x is 
first operated on by 0,^, and Q^x then operated on by fig. In view 
of this definition we may, without ambiguity, write figfi^a; for 
either side in ( 6 . 2 . 1 ). The notion of multiplication of operators can, 
of course, be extended immediately to any number of factors. Using 
informal language we may say that when we multiply operators we 
simply apply them successively; or that the product of operators 
is their resultant. 

Theorem 6 . 2 .l.| Multiplication of operators is associative. 

Let fii, fig, fig be operators mapping a set S into itself and let 
x be any element of S. Writing figcr = y and using Definition 
6 . 2 . 1 , we obtain 

'(fi2^(figfi3)|^ = fi 2 ^(figfig^) = fij|^<(fig(fig = fi 2 (figy) 

= (fiifig)y = {a^a^){Q^x) == {(fiifig)fi 3 }a:. 

Thus fii(fi 2 fi 3 ) = (fiifi 2 )ii 3 , (6.2.2) 

and the theorem is proved. 

It follows that we may, without ambiguity, write fij^figfig for 
either side of (6.2.2). 

6.2.2. When we are given a set O of operators each of which 
maps Q into itself, we sometimes require to know under what 
circumstances there exists an operator fi e O such that fix = x'^ 
where x, x* are given elements of S. 


f This theorem is not used in the present chapter. 



VI, §6.2 EQUIVALENT MATRICES 173 

Our present discussion is concerned with the set S of matrices 
and the set O of operators each of which is a product of jE-operations. 
For this special case the question just formulated in general terms 
takes the following form. When is it possible to pass from a matrix 
A to a matrix B (of the same type) by a chain of jB-operations ? 
The main object of the present section is to answer this question. 

We begin with a result which asserts the possibility of reducing 
every matrix to some particularly simple standard form by means 
of ^-operations. 

Theorem 6.2.2. (Reduction of matrices to normal form) 

Any mxn matrix of rank r can be reduced to the form 


I ir or^\ 


(6.2.3) 


by a chain of elementary operations. Conversely, the matrix (6.2.3) can 
be transformed into the original matrix by a chain of elementary 
operations. 


The matrix (6.2.3) is known as the norrml form of the original 
matrix. It will, for brevity, be denoted by 

The expression (6.2.3) for the normal form has, of course, to be 
modified in an obvious way if r = 0, r == m, orr = ti. In particular, 
if the original matrix is a non-singular nXn matrix, then its normal 
form is simply I^. 

In view of Exercise 6.1.1, the second part of the theorem is a 
trivial consequence of the first; and in the proof of the first we may 
obviously assume that the given matrix is not a zero matrix, since 
otherwise it already has the required form. If the leading elementf 
vanishes it is possible, therefore, to bring a non-zero element into 
leading position by means of £?-operations of type (i). Let the 
matrix so obtained be denoted by A = (a^^), where ^ 0. The 
successive application of the JS?-operation R^—ai^a^^ 

(i = 2,,.,,m) then transforms A into the matrix 



®12 

• • ^in ^ 

0 

622 

• • ^ 2 n 

lO 

^m2 



t The leading element of a matrix is the element standing in the top left-hand 
corner. 




174 ELEMENTARY OPERATIONS AND EQUIVALENCE VI, §6.2 

(where cLuCLh)- We next apply the operation 

and follow this by the operations 
(j = 2,...,n). These transformations reduce the matrix to the 
form 


1 0 . 

. . 0 ^ 

0 C 22 

^2/1 

lO C^2 • 

• • ^7nni 

tten as 


/ 1 

orn 

\Ol-i 

G j’ 


(6.2.4) 


where C is an (m—1) X (w—1) matrix. If C == O our reduction is 
complete, but if G O, then we treat it in the same was as the 
original matrix. This does not affect the first row and first column 
in (6.2.4) and leads to the matrix 


( h or^\ 

\oi.2 D r 

where D is an (m—2) x (n—2) matrix. Continuing in this way as 
long as the bottom right-hand matrix is not zero, we ^ultimately 
obtain the matrix 

Now the rank of this matrix is s; hence, by Theorem 6.1.1, 5 = r 
and the proof is complete. 


h orM 

8 r\n—81 

m-~8 


We shall illustrate the procedure described above by actually 
carrying out the reduction of the matrix 


' 0 

3 

-3 

1 ' 

5 

9 

—10 

3 

—1 

0 

5 

—2 

. 2 

1 

-3 



The operation Ci Q transforms this into 

'13-3 O' 

3 9 —10 6 

—20 6—1 
11-3 2, 






VI, § 6.2 


EQUIVALENT MATRICES 


176 


Next, applying in succession the operations 

vre 


JB3 JS3-|-2i?j^, R^-^ R^- 

obtain 

0 
0 
0 


-R 


C 2 -> G 2 —3(7i, 


0 \ 

6 


0 

— 1 
— 1 -1 
0 2 ) 


The operation Cg ^ C 3 transforms this matrix into 

1 0 0 0 \ 

0—1 0 5 

0-1 6 -1 ' 

,0 0 —2 2 , 

We now apply the operations i? 3 -> iJg— 

C 4 04 — 5 ( 72 , and are led to 

a 0 0 - 0 ^ 

0 1 0 0 

0 0 6 -6 * 

,0 0 —2 2 , 

The final s^ge consists of the operations R^ R^+^R^, ^ 03 , 

-> ( 74 + 6 C 3 . This reduces the matrix to 

1 0 0 O' 

0 10 0 
0 0 10 ’ 

^0 0 0 0 ^ 

which is, in fact, the normal form 

Throughout the above series of operations we have followed 
precisely the procedure laid down in the proof of Theorem 6 . 2 . 2 . 
But in numerical cases it is usually possible to introduce ad hoc 
modifications which simplify the work, and the reader should have 
no difficulty in performing the reduction of the matrix (6.2.5) in 
fewer steps than were required above. 

The technique exemplified in the preceding example is par¬ 
ticularly useful in the determination of rank. As we know by 
Theorem 6 . 1 . 1 , the rank of a matrix remains unchanged when the 
matrix is subjected to £^-operations. When, therefore, we wish to 
determine the rank of a given matrix A we apply a chain of suitable 
.B-operations which transform A into some matrix A' whose rank 



176 ELEMENTARY OPERATIONS AND EQUIVALENCE VL§6.2 

can be recognized at a glance.f It is not, as a rule, necessary to 
carry the reduction as far as the normal form, and often a small 
number of steps suffices to effect the necessary transformation. 

6.2.3. We are now ready to deal with the problem raised at the 
beginning of this section. 

Definition 6.2.2. A matrix A is equivalent to a matrix B 
{in symbols: A B) if it is possible to pass from Kto^by a chain 
of E-operations.i 

The relation of equivalence as here defined has three obvious but 
important properties. 

(i) It is reflexive. This means that A A, a result which follows 

at once from Exercise 6.1.2. 

(ii) It is symmetric, i.e. A B implies B A. This holds by 
virtue of Exercise 6.1.1. 

(iii) It is transitive, i.e. the relations A B, B ~ C imply 
A ~ C. This is an immediate consequence of Definition 6.2.2. 

Since, in particular, the relation of equivalence is symmetric 
(in other words, since it is immaterial whether we say that A is 
equivalent to B, or B equivalent to A) we can speak simply of 
equivalent matrices. Our problem is to determine in what circum¬ 
stances two matrices are equivalent. 

Theorem 6.2.3. (Equivalence of matrices) 

Each of the following two conditions is necessary and sufficient for 
two mxn matrices A and B to be equivalent. 

(i) There exist non-singular matrices X, Y {of order m, n respec¬ 
tively) such that XAY = B. 

(ii) R{A) = R{B). 

Let the condition that A and B should be equivalent be referred 
to as (iii). We have then to prove that each of the conditions (i), 
(ii), (iii) implies the other two, and it will be sufficient to infer (ii) 
from (i), (iii) from (ii), and (i) from (iii). 

In the first place, if (i) is satisfied, then (ii) follows by Theorem 
6.6.3 (p. 160). 

Again, if i2(A) = jB(B) = r, say, then (by Theorem 6.2.2) A and 
B have the same normal form Hence we can pass from A to 

t Cf. the remarks at the end of § 6.1.1, 

t For a theory of equivalence based on row operations only see BirkhofE and 
MacLane, 3, 270->9. 



VI, § 6.2 EQUIVALENT MATRICES 177 

B by a chain of .^-operations via the matrix Thus A ~ B, 

and so (ii) implies (iii). 

Finally, suppose that A ~ B. Then, by Theorem 6.1.2, there 
exist J?-matrices Ei,...,E;^, E^^),...,E<^ such that 

El... Ejfc AE(i>... E(^ = B. 

Hence XAY = B, where 

X = El... E;k, Y = E(1)... m . 

and the matrices X, Y are non-singular by Theorem 6.1.3 (i). 
Thus (iii) implies (i), and the proof is complete. 

Corollary. Similar matrices are equivalent. 

It is convenient to have a special term for transformations of 
matrices consisting of chains of JE-operations. 

Definition 6.2.3. An equivalence transformation is a 
product of E-operations. 

The terminology introduced by this definition is a natural one 
since it makes equivalent matrices transformable into each other 
by equivalence transformations. It may be noted that jE-operations 
are special equivalence transformations. 

In view of Theorem 6.2.3 we know that A and B are connected by 
an equivalence transformation if and only if there exist non-singular 
matrices X, Y such that B = XAY. By restricting, in one way or 
another, the nature of X and Y we arrive at various special types 
of equivalence transformations; and in the course of our subsequent 
discussion we shall be led to consider in detail a number of such 
types. With two of them—the elementary operations and the 
similarity transformations—^we are already familiar. 

Some further useful results follow immediately from Theorem 
6.2.3. 

Theorem 6.2.4. Let A be an mxn matrix of rank r. Then there 
exist non-singular matrices X, Y {of order m, n respectively) such that 

A = XN<.^'^>Y. 

The matrices A and are equivalent by virtue of Theorem 

6.2.2, and the assertion therefore follows by Theorem 6.2.3 (i). 

Theorem 6.2.6. Every non-singular square matrix can he 
expressed as a product of elementary matrices. 

6582 K 



178 ELEMENTARY OPERATIONS AND EQUIVALENCE VI,§6.2 

Let A be a non-singular nxn matrix. By Theorem 6.2.3 A and 
are equivalent, i.e. A can be derived from by a chain of 
jE^-operations. But, by Theorem 6.1.2, each such operation is 
effected by premultiplication or postmultiplication by an ^/-matrix. 
Hence 

A = El... ... = El... EfcEW ... E^^ 

where the E^- and E^^ are suitable jE?-matrices. 


6.3. Applications of the preceding theory 

6.3.1. We begin by explaining a practical method of computing 
inverses of non-singular matrices. We know by Theorem 6.2.2 that 
such matrices can be reduced to the unit matrix by a chain of 
^-operations. We need, however, a slightly stronger result. 

Theorem 6.3.1. A non-singular matrix of order n can be reduced 
to by means of elementary row operations only. 

If a matrix A is non-singular at least one element in the first 
column is non-zero. Hence, interchanging two rows (if necessary) 
we can bring a non-zero element into leading position. Subtracting 
now suitable multiples of the first row from the other rows we 
obtain a matrix in which all elements in the first column, other than 
the leading element, are equal to zero. The leading element is then 
made equal to 1 by multiplying the first row by a suitable constant. 
In this way we obtain a matrix of the form 


1 

bi2 

• • ^In 

0 

^22 

• • ^2n 

.0 

bn2 



Applying now the same technique to the non-singular submatrix 



and continuing the process for as long as is necessary, we obtain 
ultimately a triangular matrix of the form 


n 

0 

0 


Pl2 

1 

0 


Pl3 


Pm 

P2n 

Pzn 


\0 0 0 








VI, §6.3 APPLICATIONS OF THE PRECEDING THEORY 179 

All elements above the diagonal can now be made equal to 0 by 
means of row operations of type (iii), and the reduction of A to I has 
thus been proved possible. 

Exercise 6.3.1. Show that, by the application of elementary row opera¬ 
tions, any rectangular matrix can be reduced to the form P = (Va)* where 
= 0 whenever i > j. 

Theorem 6.3.2. Suppose that a non-singular matrix A is reduced 
to 1 by a sequence, say of elementary row operations. Then 

the application of these operations, in the same order, transforms I 
into A“^. 


By Theorem 6 . 1.2 the application of respectively is 

equivalent to premultiplication by certain £^-matrices, say Ej,..., Ej^.. 
Hence, by hypothesis, 

E^ ... E^A = I, 

and so, postmultiplying by A”^ we obtain 

A-i = E;,...EiI. 

Consequently A”^ is obtained when the operations 
applied, in that order, to I. 

To illustrate the result just proved we consider the matrix 


A = 


1 1 


— 1 0 
-3 5 



The reader should verify that A can be reduced to I 3 by applying, 
inturn, the operations 

jjf ?3 —■^ 3 > -^2 -^1 ^ 1 —-^ 2 * If these 

operations are applied, in the same order, to I 3 we obtain the 
following sequence of matrices: 


/I 0 o\ /I 0 o\ /I 0 o\ no 0\ 

1 1 0 , 110 , 1 1 0 , 1 1 0 , 

\0 01 / \3 0 1 / \-5 -8 1 / \5 8 - 1 / 


( 1 0 0\ /16 24 —3\ no 16 —2\ 

6 9 ~1 , 6 9 -1 , 6 9 -1 . 

6 8 - 1 / \ 6 8 - 1 / \ 6 8 - 1 / 

The last matrix in the sequence is the required matrix A”^. 



180 ELEMENTARY OPERATIONS AND EQUIVALENCE VI, §6.3 

A procedure for finding inverses which involves both row 
operations and column operations can also be devised without 
difficulty since, given a non-singular matrix A, ^-matrices 
El,..., Ej, E®,..., E® can be found such that 

Ejk... EiAEW... E® = I. 

This procedure is not, however, as convenient as that given by 
Theorem 6.3.2. 

6.3.2. An important use of jE^-operations (or, more precisely, of 
elementary row operations) can be made in the theory of linear 
equations. 

If in a system <2 of linear equations we interchange the position of 
two equations, or multiply the coefficients of an equation by some 
non-zero constant, or add any multiple of one equation to another, 
then the resulting system is plainly equivalent to S. Now a system 
of linear equations can be conveniently represented by its aug¬ 
mented matrix, and the transformations just listed correspond, in 
fact, to .B-operations on the rows of this matrix. Thus, for instance, 
the addition of twice the third equation to the fifth equation 
corresponds to the operation R^-> on the augmented 

matrix. 

It follows, then, that if we subject the augmented matrix B of 
S to a chain of elementary row operations, the resulting matrix will 
be the augmented matrix of a system of linear equations equivalent 
to S. The obvious procedure is, therefore, to operate on B in such 
a way as to obtain a matrix which shall be as simple as possible ;t 
and Exercise 6.3.1 shows that we can always reduce B to some 
matrix P = {p^^) in which == 0 for i > j. The system of which 
P is the augmented matrix is then far more manageable than the 
original system. It is not, of course, necessary in every case to 
effect the complete reduction of B to the quasi-triangular form P, 
for often we recognize even after a few steps whether the original 
system is consistent, whether it possesses non-trivial solutions, or 
whether certain equations are redundant. 

As an example we may consider the system of equations (5.3.6) 
discussed in § 6.3.2. Performing the operations ■«-> i?i, 
B^-^ B^—iB^f 2J8j, E 2 —^JE 2 , B^-^ ~^B^ on the 


t This amounts, in pzactioe, to having as many zero elements as possible. 



VI, §6.3 APPLICATIONS OF THE PRECEDING THEORY 181 
augmented matrix 



'3 

-7 

14 -8 

24\ 


1 

—4 

3 -1 

— 

•2 


0 

1 

1 —1 


6 


u 

— 16 

— 1 6 

—46j 

we reduce it to the form 





1 —4 

3 —1 - 

-2' 



0 1 

1 —1 

6 



0 1 

1 —1 

6 



U 1 

1 —1 

6 . 



The original system (5.3.6) is therefore equivalent to the system 
consisting of the two equations 

= —2, y+2—ti; = 6. 

We can now take y and z as the disposable unknowns and complete 
the solution as before. 

The procedure just indicated is not, of course, at all surprising 
since it amounts to little more than the familiar manipulation of 
simultaneous linear equations as described in school textbooks. 
However, by pursuing the idea a little farther, we could give 
alternative proofs of some of the results established in § 5.3 and 
§ 5.4. We must refer the reader elsewhere for details.! 

6.3.3. As another application of the theory developed in § 6.1 
and § 6.2 we shall give a new proof of Theorem 5.6.2 (p. 159). 

The first step is to prove Theorem 5.6.3. Let S be an m X n matrix 
and X a non-singular matrix of order m. By Theorem 6.2.5, X can 
be expressed as a product of jE^-matrices, say 

X = El... 

Now, by Theorem 6.1.2, multiplication by an iS?-matrix is equi¬ 
valent to an jEJ-operation and since, by Theorem 6.1.1, JSJ-operations 
preserve rank we have 

iZ(S) = R(E.^S) = i2(E;t-iEfcS) = ... = ... 

Thus R{XS) = i?(S). (6.3.1) 

We next come to the proof of Theorem 5.6.2, i.e. effectively of 
the inequality i2(AB) < •B{B). We assume that A, B are of type 

t See, for instance, BirkliofE and MacLane, 3, 50-63, and StoU, 7, chap. i. 



182 ELEMENTARY OPERATIONS AND EQUIVALENCE VI, §6.3 

mxn, nxp respectively and that It{A) = r, and we write 
= N. 

Let P be an « xp matrix and represent it in the partitioned form 



where Pj denotes the matrix consisting of the first r rows of P, and 
Pg that consisting of the remaining n—r rows. We then have 



Hence i2(NP) = ii(Pi), and therefore 

ii(NP) < i2(P). (6.3.2) 

Now, by Theorem 6.2.4, A may be written in the form A = XN Y, 
where X, Y are non-singular matrices of order m, n respectively. 
Hence, by (6.3.1) and (6.3.2), 

iJ(AB) = i?(X.NYB) = iZ(N. YB) < i?(YB) = i2(B), 
and Theorem 6.6.2 is therefore established. 

6.4. Congruence transformations 

It was mentioned earlier that various special classes of equi¬ 
valence transformations play an important part in linear algebra. 
In the present section we propose to consider one such class—^the 
class of congruence transformations. Its precise significance will 
not, however, become fully apparent until we come to the discussion 
of quadratic forms in Chapter XII. 

Unless the contrary is stated, all matrices considered below are 
assumed to be square matrices of order n. 

Definition 6.4.1. If IR = P^AP, where P is some Twn-singular 
matrix, then B is said to he obtained from A by a congbubnob 
TRANSFORMATION, and B is said to he congruent to A. 

We observe at once that a congruence transformation is a special 
equivalence transformation. We also observe that, since 

Qr(p2'AP)0 = (PO)^A(PO), 

the product of two congruence transformations is again a con¬ 
gruence transformation. 

Exkbcise 6.4.1. Show that the relation of congruence is reflexive, 
symmetric and transitive. 



CONGRUENCE TRANSFORMATIONS 


183 


VI, § 6.4 

Since, in view of this exercise, the statements ‘A is congruent 
to B’ and ‘B is congruent to A’ imply each other, it follows that we 
may speak simply of two matrices as being congruent (to each 
other). 

The problem we propose to discuss below is suggested by Theorem 
6.2.2. That result shows, in particular, that every square matrix 
can be reduced to diagonal form by an equivalence transformation. 
We now ask ourselves whether this statement continues to be valid 
if instead of considering equivalence transformations we restrict 
ourselves to the narrower class of congruence transformations. 
The answer to this question is contained in Theorem 6.4.1 which we 
prove by the use of JS/-operations and J5-matrices. 

Definition 6.4.2. The square matrix A == (a^^) of order n is 
SYMMETRIC if a^j = a^^ {i,j = 1,..., n), i.e. if A^ = A.f 

The symmetry mentioned in the definition is, of course, symmetry 
with respect to the diagonal. 

Exercise 6.4.2. If A is a rectangular matrix, show that A^A is a 
symmetric (square) matrix. 

Theorem 6.4 J. A matrix is congruent to a diagonal matrix if 
and only if it is symmetric, 

(i) One part of the theorem is trivial. For if A is congruent to 
a diagonal matrix, then there exists a non-singular matrix P and 
a diagonal matrix D such that A = P^DP. Since diagonal matrices 
are obviously symmetric, this implies that 

at _ pT£)Tp _ pTDp _ A, 

and so A is symmetric. 

(ii) Next, assuming that A = (a^f) is symmetric, we have to 
show that if can be reduced to diagonal form by a congruence 
transformation. 

We first make some preliminary observations. Suppose that 
i ^ j. Let E(^y) be the ^-matrix obtained from I by the operation 
Ri^ Rj (or alternatively by the operation We then 

clearly have ^ 

t For some interesting results on the rank of symmetric matrices see Stoll, 7, 



184 ELEMENTABY OPEBATIONS AND EQUIVALENCE VI, §6.4 

If ilfj denotes the product (in either order) of the operations 
Bj, Gi ■«-»■ Cj, then, by Theorem 6.1.2, 

= (6.4.1) 

Again, let Qu) be the jE7-matrix obtained from I by the operation 
or, alternatively, by the operation It 

is then at once clear that 

^00 = 

If denotes the product (in either order) of the operations 
R^ “> R^-^Rjf 0^ “> then 

QS(A) = 


= G(^,,AG(^,). (6.4.2) 

Equations (6.4.1) and (6.4.2) show that both and Q** are con¬ 
gruence transformations. 

We now come to the main part of the proof. Assume, as may be 
done, that A = (a^^) O. Then at least one element of A is non¬ 
zero, and we shall show that the leading element is either non-zero 
or else can be made so by a congruence transformation. If = 0, 
but ^ 0 for some i in the range 2 ^ i ^ then the operator 
effects the required change. If, on the other hand, = 0 for 
1 < i ^ 71, then there exists some ^ 0 with i ^4 j. In that case 
we apply £2^, and it is easily seen that the (i, i)th element of £2S(A) 
is (lij+aji = 2aij ^0. If i = 1 this means that the leading 
element has now been made non-zero, while if i > 1 the required 
change is brought about by the additional application of 
Thus it is always possible to place a non-zero element in the 
leading position by applying a suitable congruence transformation. 
The resulting matrix will, of course, be again symmetric. Thus 
there exists a non-singular matrix P such that 



where 6 9^: 0, x is a column vector of order n- 
matrix of order 

The matrix / , 

‘'“(oL 


-1, and B is a square 


-6-ixn 
K-i I 



VI, §6.4 CONGRUENCE TRANSFORMATIONS 

is plainly non-singular, and we have 


185 


?)(x bKJ V"! 

\0 B-6-ixx2'/\0 I ) \q 


Thus there exists a non-singular matrix S (= PQ) such that 


Si’AS 




where C is a square matrix of order n—l. This conclusion has been 
reached on the assumption that A 7*^: O, but for A = O it holds, 
of course, trivially. We have, furthermore, 

(S^'AS)^ = S^A^S = S^AS; 

thus S^AS is symmetric and so, therefore, is C. 

The proof is now easily completed by induction. For n = I the 
assertion is trivial. Assume that it is true for n—1, where n 2. 
Then there exists a non-singular matrix and a diagonal matrix 
D^, both of order n—l, such that Uf CUi = D^. Hence 


/I 

01 

1^6 

OWl 


0\ 

(0 

UiJ 

1 io 

cjlo 

' uj- 

\0 dJ 


Writing U = (; D-(‘ 


we have, therefore, U^{S^AS)U = D, 
i.e. (SU)^A(SU) = D. 

Since D is diagonal and SU obviously non-singular, the proof is 
complete. 


Exercise 6.4.3. Show that if A is a real (complex) symmetric matrix of 
rank r, then there exists a real (complex) matrix U such that U^AU is a 
diagonal matrix having precisely r non-zero diagonal elements and that the 
value of each of these elements is 1 or — 1 (is 1). 

It should be noted that the reduction of A to diagonal form in 
the proof above has been carried out entirely by means of rational 
operations. For this reason we speak in this case of a rational 
reduction. In contrast we shall, in Chapter X, consider reductions 
which cannot be effected by rational operations alone. 



186 ELEMENTARY OPERATIONS AND EQUIVALENCE VI,§6.6 

6.5. The general concept of equivalence 

6.5.1. It was pointed out after Definition 6.2.2 (p. 176) that 
the relation of equivalence between matrices is reflexive, symmetric, 
and transitive. These three properties occur so persistently in 
different mathematical situations that it is useful to introduce an 
abstract concept of equivalence, of which equivalence between 
matrices is a special instance. 

Definition 6.5.1. Let Q he a set of elements, and let he a 
relation such that if x,y e where x and y are not necessarily distinct,- 
then X y either holds or does not hold. Suppose further that the 
relation ~ is 

(i) REFLEXIVE, i.e. X ~ X for every x eS\ 

(ii) SYMMETRIC, i.e. X y implies j/ ~ a;; 

(iii) TRANSITIVE, i.e. x r^y and y z imply x z. 

Then ~ is said to he an equivalence relation in S, and any two 
elements x, y such that x ^ y are said to he equivalent in the sense 
of this particular relation. 

The simplest and in some ways the most fundamental instance 
of equivalence is that of equality. Equivalence, as defined above, is 
a natural generalization of equality, and it is interesting to note that 
in dealing with the relation of equality we normally make use only of 
the three properties of reflexiveness, symmetry, and transitiveness. 

Other examples of equivalence come readily to mind. Thus, if 
m is a positive integer, then two integers a and 6 are said to be 
congruent modulo m if a—6 is divisible by m. This relation between 
them is denoted symbolically by a = 6 (modm), and is called 
congruence modulo m. Since it plainly satisfies the three axioms, it 
is an equivalence relation in the set of integers. 

We are also familiar with various equivalence relations which 
hold between matrices. The relation defined as equivalence of 
matrices in Definition 6.2.2 is an instance of an equivalence relation 
in our present sense. Congruence of matrices is also an equivalence 
relation, in view of Exercise 6.4.1 (p. 182); and so, too, is similarity, 
as the reader should have no difficulty in verifying. 

Exercise 6.6.1. Show that isomorphism between linear manifolds is an 
equivalence relation. 

t Axiom 1 of Book I of Euclid's Elements states that two things which are 
equal to a third are equal to each other. This statement expresses, in effect, the 
transitive property of the relation of equality. 



187 


VI, §6.6 THE GENERAL CONCEPT OF EQUIVALENCE 

Theorem 6.6.1. Let S 6e a set of elements for which an equivalence 
relation has been defined. Then S can be subdivided into classes of 
elements in such a way that (i) each element belongs to precisely one 
class, (ii) two elements belong to the same class if and only if they are 
equivalent. 

The proof is almost obvious. If x is any element of S, let C^. 
denote the class of elements equivalent to x, i.e. the class of all 
2 / G S such that x r^y. 

We first show that if x ^y, then — Cy.^ If 2 g then 
X z and &oz ^ x. But since x y this implies z y, and hence 
y ^z, i.e. z e Gy. Similarly z e Gy implies z e G^, and we have, 
therefore, G^ = Gy. 

If, on the other hand, x and y are not equivalent, then the classes 
Cj, Gy are disjoint, i.e. they possess no common element. For 
assume that z eG^, z e Gy. Then x z, y ^ z, and so Xf^z, 
z r^y\ hence x y, contrary to hypothesis. 

We have thus obtained a subdivision of S into disjoint classes 
such that two elements belong to the same class if and only if they 
are equivalent. Moreover, in view of the property of reflexiveness, 
each element x G S belongs to at least one class, namely to G^\ and 
it belongs to no other class for, as we have seen, two classes are 
' either disjoint or equal. 

Definition 6.5.2. The classes of equivalent elements into which 
a set ® is subdivided by means of an equivalence relation ~ are 
called the equivalence classes (of S, with respect to ~). 

If an equivalence relation is defined in ®, each equivalence class 
is determined completely by any one of its elements, and equi¬ 
valence between elements of S can be replaced by equality between 
equivalence classes. 

The definition of equivalence classes has been given above in 
terms of equivalence, but the converse procedure can equally well 
be adopted. If a subdivision of S into disjoint classes G, C',... of 
elements is given, we can define an equivalence relation in S with 
respect to which (7, G\... are the equivalence classes. We simply 
put X r^y whenever x and y belong to the same class. The relation 
~ then evidently satisfies the three axioms of Definition 6.5.1, and 
so is an equivalence relation. 

t Two classes are said to be equal if they comprise precisely the same elements. 



188 ELEMENTARY OPERATIONS AND EQUIVALENCE VI,§6.5 

A branch of mathematics in which the idea of equivalence is 
particularly prominent is that part of analysis in which the con¬ 
struction of the number system is carried out from an initial set 
of axioms. Thus, in passing from the integers to the rational 
numbers, we consider ordered pairs (a, 6) of integers a, 6, and 
write (a, b) ~ (c, d) whenever ad = be. It is then easily verified 
that ~ is an equivalence relation. Rational numbers can now be 
defined as equivalence classes with respect to this relation, and it 
then remains to be shown that they possess all the properties 
which we associate intuitively with rational numbers.f 

Exebcise 6.6.2. What are the equivalence classes when equivalence is 
defined as (i) equality between numbers, (ii) congruence modulo m ? 

Exercise 6.6.3. Let denote convergent sequences, and write 

{®n} {2/n} whenever aj„--i/n ^ ^ oo* Show that the relation is 

an equivalence relation and determine the equivalence classes. 

6.5.2. Equivalence relations may often be associated with sets 
of operators. 

Theorem 6.5.2. Let S be a set of elements and O a set of trans¬ 
formations of S into itself mih the follomng properties. 

(i) O contains the identity operator, which transforms every element 
of S into itself. 

(ii) If x,y G S, O G O, and Qx == y, then there exists an operator 

G <1) such that = x. 

(iii) If X, y, z e S, £2, £2' gO, and Q.x = y, Qfy = z, then there 
exists an operator £2" g O such that Q"x = z. 

Suppose further that, for any x,y e we write x r^y if and only 
if there exists someO, e O such that Qx = y. Then ~ is an equivalence 
relation in S. 

The relation is reflexive since, in view of (i), x ^ x for all 
X e (B. Suppose next that x^^y, i.e. Qx = y for some £2 g O. Then, 
by (ii), there exists some £2* g O such that £2*^ = x, i.e. y r^x. 
This proves the symmetry of Finally, \Qtxr^y,yr^ z. This 
means that, for some £2, £2' g O, £2x = y and Cl'y = z. Hence, by 
(iii), there exists some £2" g €> such that £2"a; = z, i.e. x z. Thus 
~ is transitive, and therefore it satisfies all three axioms of 
Definition 6.6.1. 

t For details see Landau, Fotmdations of Analyaia, chap. ii. 



189 


VI, §6.5 THE GENERAL CONCEPT OF EQUIVALENCE 

Definition 6.5.3. The equivalence relation ~ of the preceding 
theorem is called equivalence in S with respect to the set 

(t> OF OPERATORS. 

In the case of an equivalence with respect to a set of operators, 
an equivalence class simply consists of all elements which can be 
transformed into a specified element by means of suitably chosen 
operators. 

Several examples of equivalence with respect to a set of operators 
are implicit in our earlier discussion. Thus, let S be the set of 
mxn matrices and O the set of equivalence transformations (in 
the sense of Definition 6.2.3, p. 177). Then equivalence between 
matrices can be interpreted as equivalence in S with respect to O; 
and the equivalence classes consist, as we know, of mxn matrices 
having the same rank. Again, let 6 be the set of nxn matrices. 
Similarity is then an equivalence relation in S, the equivalence 
classes being sets of similar matrices.' This equivalence is, in fact, 
equivalence in S with respect to the set of similarity transforma¬ 
tions. An analogous statement applies to congruence transforma¬ 
tions. 

Exercise 6.5.4. Verify these statements. 

In our subsequent discussion in Parts II and III we shall meet 
many further instances of equivalence relations, equivalence 
classes, and equivalence with respect to sets of operators. The 
reader’s attention is directed particularly to the remarks in § 9.2.3. 


6.6. Axiomatic characterization of determinants 

The technique evolved in studying elementary operations 
enables us to throw further light on the theory of determinants. 
In Chapter I we were concerned with deriving a series of properties 
of determinants, and our object now will be to prove that three of 
the simplest of these properties suffice to characterize completely 
the nature of determinants. We shall, in fact, show that if a 
particular numerical function of a square matrix is known to 
satisfy certain simple conditions stated below, then it is equal to 
the determinant of that matrix. 

We require, in the first place, a preliminary result on certain 
matrix transformations which closely resemble JE^-operations. 



19a ELEMENTARY OPERATIONS AND EQUIVALENCE VI,§6.6 

Theorem 6.6,1. Any non^aingular square matrix can be irans^ 
formed into a diagonal matrix by a chain of operations of the following 
two types. 

(i) Addition of any multiple of a row to another row. 

(ii) Interchange of two rows followed by the multiplication of one of 
them by —1. 

It may be noted in passing that the relation between two 
matrices which can be transformed into each other by chains of 
operations enumerated above is a relation of equivalence. Theorem 
6.6.1 asserts, in fact, that every non-singular matrix is equivalent 
(in the sense specified) to a diagonal matrix. Since, moreover, the 
determinant of a matrix clearly remains unaffected by either of the 
two operations, it follows that the diagonal matrix in question is 
also non-singular. 

To effect the transformation whose existence is asserted by the 
theorem we rely essentially upon the method established in 
§ 6.2.2 and § 6.3.1. We denote, as usual, the order of the matrices 
by n. Since the given matrix is non-singular the first column 
contains at least one non-zero element, and if necessary we can use 
an operation of type (ii) to bring this non-zero element into the 
leading position. Next, subtracting suitable multiples of the first 
row from the other rows we can reduce to zero the last n—l elements 
in the first column. Proceeding in a similar manner we obtain 
eventually an upper triangular matrix whose diagonal elements are 
all non-zero. We now subtract suitable multiples of the last row 
from the first n—\ rows and reduce to zero the first n—\ elements 
of the last column. Treating the other columns in a similar manner 
we gradually reduce to zero all elements above the diagonal and 
are then left with a diagonal matrix. 

Exercise 6.6.1. Show that the conclusion of Theorem 6.6.1 need not be 
valid for a singular matrix. 

We can now prove the principal result of this section. We shall 
consider a numerical function /(A) of a square matrix A and, 
assuming that it satisfies certain conditions which are also satisfied 
by IA I, we shall show that it is equal to [Aj. 

Theorem 6.6.2. Let /(A) be a numerical function of the square 
matrix A, and suppose that it satisfies the following conditions. 

(i) The value of f{A) remains unchanged if any operation of the 
type Rf Bf+R^ (r ^ s) is performed on A. 



VI, §6.6 AXIOMATIC CHARACTERIZATION OF DETERMINANTS 191 

(ii) If dny row of A is multiplied by a constant A, then the value of 
/(A) is also multiplied by A. 

(iii) /(I) = 1. 

We then have /(A) = |A|. ( 6 . 6 . 1 ) 

Our first step consists in showing that the value of/(A) remains 
unchanged when A is subjected to either of the operations listed 
in Theorem 6 . 6 . 1 . 

Let r 5 , A 0 , and suppose that A is subjected in turn to 
the operations Iir~^ A“^i?g, the resulting 

matrices being denoted by A^, A 2 , A 3 respectively. Then, by 
conditions (i) and (ii), 

/(A 3 ) = = A-V(A,) = A-iA/(A) =/(A). 

Now A 3 is obtained from A by adding A times the 5 th row to the rth 
row; and it follows, therefore, that/ is invariant under operations 
of type (i). 

Again, let r 5 and suppose that A is subjected, in turn, to the 
operations R^ -> Rj.+ Rs> Ha Ila~~^ry Rr+Rayth^e resulting 
matrices being denoted by A^, Ag, A 3 respectively. Then, by the 
result just proved, we have 

f{A,)-^f(A,)=f(A,)=f(A). 

Now A 3 is obtained from A by interchanging the rth and 5 th rows 
and then multiplying the 5 th row by — 1 . Hence / is invariant 
under operations of type (ii). 

Now let A be a given non-singular matrix. By virtue of Theorem 
6 . 6 . 1 , A can be transformed by a chain of operations of types (i) 
and (ii) into a matrix A = dg(Ai,..., A,J, where A^ :/= 0 ,...,A,^ ^ 0 . 
Hence/(A) =/(A). But, by conditions (ii) and (iii), 

/(A) = Ai...A„/(I) = Ai...A„= |A1= |A1, 
and therefore ( 6 . 6 . 1 ) is valid for non-singular matrices. 

If, on the other hand, A is singular then its rows are linearly 
dependent and one of them, say the fcth, is expressible as a linear 
combination of the other rows. Hence a chain of suitable operations 
of type (i) will reduce all elements in the kth row to zero. Denoting 
the matrix thus obtained by A' we have /(A) ==/(A'). But, 
multiplying the kth row of A', which consists entirely of zeros, by 
X ^0 and making use of condition (ii), we see at once that / (A') == 0 . 
Hence/(A) = 0 and ( 6 . 6 . 1 ) remains valid for singular matrices. 
The proof is therefore complete. 



192 ELEMENTARY OPERATIONS AND EQUIVALENCE VI,§6.6 

It should be noted that even if all references to determinants 
were suppressed, the above argument would still show that a 
function /(A) which satisfies conditions (i)-(iii) is unique. This 
suggests that the entire theory of determinants could be built up 
in a way quite distinct from that adopted in Chapter I. The basis 
of the new method would be the definition of the determinant of A 
s»s/(A). We do not propose here to enter into further details but 
may mention in conclusion that this programme was carried out 
about seventy years ago by Weierstrass.f 


PROBLEMS ON CHAPTER VI 


1. Let/(a;), g(x)f„, be functions defined for 0 < a; < 1, and write//^gr 
whenever bd \f{x)—g(x)\ < 1. Show that rs./is not an equivalence rela- 
tion. 

2. Criticize the following argument. ‘ Conditions (ii) and (iii) of Definition 
6.6.1 (p. 186) imply condition (i), for x ^ y implies y x, and these two 
relations imply a; ^ a;. * 

3. Express the matrix 


as a product of ^-matrices. 

4. Find the inverse of 


4 3 
1 1 
I 1 


/I 1 1 
/ 0 1 0 
ll 1 2 

\2 2 4 



6. Determine the normal forms of the matrices 


(i) / 6 -1 0 4\ (ii) /-5 10 6\ 

3 6 11; I 1 -2 “M 

\-l 0 -2 2/ 1 -2 4 2 r 

\ 2 -1 1 / 

6. Solve completely each of the following systems of equations, 
(i) 4a;+6t/+32;4-4t(; = 3, 

2a;-f 42/+2 ;+w; = — 1, 
3a;-f4y+3z+4t/; — 4, 

2a;-|-3t/+z+w = 0. 


(ii) 2a;—y+3z—6W = — 7, 
a;+3y+«; = 3, 

— 7a;-|-6y+42 —-m; = —3, 


t See Carath6odory, VorUmngen Hher reelle Funktionen (2nd edition), 318-26; 
Artin, Oaloia Theory (2nd edition), 11-20; Schreier and Spemer, 4, 68-83; 
Lichnerowicz, 17, 26-39. 



VI 


PROBLEMS ON CHAPTER VI 


193 


(iii) Zx-\'Zy—z-\‘^u—2v = 14, 
x—y-\-lz—u = — 2 , 
^x+y-\-nz-\-2u-2v = 10, 

2354-41/—82+6tt—2v ~ 16. 

7. Let A, B be two diagonal n xn matrices whose diagonal elements are 
identical except possibly for order. Show that there exists a non-singular 
symmetric matrix H such that HAH = B. 

8. Show that the reduction to normal form of an m X n matrix can always 
be effected by fewer than (m-f l)(w.-|-l) E-operations. 

9. Show that any rectangular matrix of rank r (> 1) can be expressed 
as the sum of r matrices of rank 1. 

10. By using the technique of E-operations prove that (i) a system of n 
homogeneous linear equations in n unknowns possesses a non-trivial solution 
if and only if the matrix of coefficients is singular; (ii) a system of m homo¬ 
geneous linear equations in n- > m unknowns always possesses a non-trivial 
solution. 

11. Show that every non-singular 2x2 matrix can be represented as a 
product of matrices of the following types: 

« 0^ /I 0\ /I A / 

0 1/’ VO «/’ Vo 1/' A 

Obtain such a representation for the matrix 

12. Show that an E-operation of type (i) can be expressed aa a product 
of E-operations of types (ii) and (iii). Hence obtain a refinement of the 
result stated in the preceding question, 

13. Show that every square matrix can be represented as a product of 
triangular matrices. 

14. Show that if A is any square matrix, then there exist non-singular 
matrices P and O such that PA and AQ are triangular. 

15. Let the n X n matrix A = (a^y) be ofrank rand suppose that it satisfies 
the equation A^ = A. Use Theorem 6.2.4 (p. 177) to prove that 

2 = r. 

i-1 

16. Use Theorem 6.2.4 (p. 177) to prove Theorem 6.6.5 (p. 162) for the 
case of square matrices. 

17. By a modification of the proof of Theorem 6.4.1 (p. 183) show that, 
if A is a matrix such that = A, then there exists a non-singular matrix 
P such that P'^AP is diagonal. 

18. A rectangular matrix A has an rxr submatrix B such that (i) B is 
non-singular, (ii) every (r-fl) x (r4-l) submatrix of A which contains B is 
singular. Prove that E(A) = r. 

19. Find the rank of 

0 c -6 a'\ 

-c 0 o 6'\ 

6 -a 0 c'T 

-a' -6" 0/ 

where oa'-f 65'4-cc' = 0, and a, b, c are positive numbers. 

6682 O 




194 ELEMENTARY OPERATIONS AND EQUIVALENCE VI 

20. By an elementary integral operation on a matrix we shall understand 
an operation of one of the following types: (i) interchange of any two rows 
(or columns); (ii) addition of any integral multiple of one row (or column) to 
another row (or column); (iii) multiplication of any row (or column) by — 1. 

Let A be a rectangular non-zero matrix whose elements are integers, and 
let [A] denote the minimum of the absolute values of the non-zero elements 
of A. Show that, if [A] does not divide every element of A, then A can be 
transformed, by a chain of elementary integral operations, into a matrix B 
such that [B] < [A]. Deduce that any non-zero matrix with integral elements 
can be transformed, by a chain of elementary integral operations, into a 
matrix C such that [G] divides all elements of C. 

21. Prove that every rectangular matrix A with integral elements and 
of rank r > 0 can be transformed, by a chain of elementary integral opera¬ 
tions, into a matrix 



where dots indicate zeros and are positive integers such that, for 

1, divides Show, furthermore, that 

die = 1 .»■). 

where Aq = 1 and, for 1 < A; < r, hj^ is the highest common factor of all 
Aj-rowed minors of A. (The numbers are called the elementary 

divisors of A). 

Let two matrices with integral elements be called equivalent if one can 
be transformed into the other by a chain of elementary integral operations. 
Give necessary and sufficient conditions for two matrices to be equivalent. 

22. A square matrix is said to bo unimodular if its elements are integers 
and its determinant is ± 1. Show that, if A is any given mxn matrix with 
integral elements and of rank r, then there exist unimodular matrices P, O 
of order m, n respectively, such that PAQ = D, where D is the matrix 
defined in the preceding question. 







PABT II 


FURTHER DEVELOPMENT OF MATRIX 
THEORY 

VII 

THE CHARACTERISTIC EQUATION 

All results obtained in Part I have been essentially simple and 
may be said to have followed almost automatically from the two 
notions of matrix and linear dependence. The introduction of a 
new and powerful idea, that of the characteristic equation, permits 
the further development of the subject at a somewhat less elemen¬ 
tary level, and enables us to deal with a series of problems which 
are beyond the range of the methods employed previously. The 
idea of the characteristic equation will be seen to underlie virtually 
all subsequent discussion of matrices and quadratic forms in the 
remaining parts of the book. 

From now on, unless the contrary is stated, all vectors are 
assumed to be of order n and all matrices of type nxn\ moreover, 
both vectors and matrices are assumed to be complex. 

7.1. The coefficients of the characteristic polynomial 

7.1.1. Definition 7.1.1. Let A = {a^^) be an nxn matrix and 
A a scalar variable. The characteristic polynomial of A is the 
polynomial x(A) = x(A; A) given 63 /f 



A— 

— 


X(A) = lAI-AI = 

®21 

A 0^22 • • 

• ®2n 


-«nl 

®n2 

. A 


The characteristic equation of A is the equation x(A) == 0 . Its 
rootsx CHARACTERISTIC {or LATENT) ROOTS of A. 

The polynomial x(A) is evidently of degree n\ in fact, its leading 
term is A^. Thus the characteristic equation has precisely n roots; 

t We have, in fact, already made use of this polynomial in § 3.6. 

t The existence of the roots is guaranteed by the fundamental theorem of 
algebra. 




196 THE CHARACTERISTIC EQUATION Vn,§7.1 

but these need not, of course, be all distinct. If a certain root 
occurs precisely k times it is said to be a h-fold root, or a root of 
multiplicity k, A Xj-fold root is called simple or multiple according as 
fc = 1 or fc > 1 . We shall denote by m;^(A) the multiplicity of the 
number A as characteristic root of A, with the convention that 
m;^(A) = 0 if A is not a characteristic root of A at all. 

Exercise 7.1.1. Show that different matrices need not have different 
characteristic polynomials. 

The characteristic equation was first investigated for general 
matrices by Cayley in 1853. It occurs in many different contexts 
of which one may be indicated here. Consider a collineation of 
projective space of n —-1 dimensions. In this space each point is 
represented by a non-zero vector of order n, and two vectors which 
oidy differ by a non-zero scalar multiple represent the same point. 
For a given choice of the coordinate system the collineation may 
be sx)ecified by an w X n matrix A such that each point x is trans¬ 
formed into Ax. It is often important to obtain information about 
the united points of the collineation, i.e. points which are trans¬ 
formed into themselves. Now the vector x ^ 0 represents such a 
point if and only if there exists some A 9 ^: 0 such that Ax = Ax, i.e. 
(AI—A)x == 0. By the theory of homogeneous linear equations 
we know that an x 0 which satisfies this equation exists if and 
only if |AI—A| = 0 . Thus the analysis of collineations leads at 
once to the study of the characteristic equation. 

There are many other topics in both pure and applied mathe¬ 
matics where the characteristic equation makes its appearance.f 
Historically, one of its first applications occurs in the theory of 
secular perturbation of planetary motion. It is for this reason that 
the equation was referred by the earlier writers as the secular 
equation. 

The determination of the characteristic roots of a matrix involves 
the solution of an equation of the nth degree, but in a few special 
cases the roots can be found by inspection. Thus, if A is a diagonal 
matrix, or even a triangular matrix, its n characteristic roots are 
equal to the n diagonal elements. 

7.1.2. A simple criterion for the singularity of a matrix can 
be deduced from the following result. 


t See, for example, § 12.2.3. 



Vn, § 7.1 COEFFICIENTS OF CHARACTERISTIC POLYNOMIAL 197 

Theorem 7.1.1. If A„ are the characteristic roots of A, then 

|A|=Ai...A„. 

We have |AI-A1 (A-Ai)...(A-A„). 

Hence, putting A — 0 , we obtain 

|-A| = (-l)«Ai...A„, 
and the assertion follows. 

CoROLLABY. A matrix is singular if and only if at least one of 
its characteristic roots is equal to zero. 

The problem of determining the precise number of zero charac¬ 
teristic roots of a matrix will be considered in § 10 . 2 . 1 . 

Theorem 7.1.1 relates the value of the constant term in x(A) and 
of a minor of A. It is not difficult to obtain analogous results for all 
coefficients of x(A). 

Definition 7.1.2. A principal minpr of a matrix A is a minor 
whose diagonal is part of the diagonal of A. 

Thus a principal minor of A is obtained by selecting rows and 
columns with the same suffixes. Special cases of the principal 
minors of A = (a^^) are the diagonal elements ctn,.the 
determinant |A|. 

Theorem 7.1.2. For 0 < r < ti, the coefficient of A** in the 
characteristic polynomial x{A) of A is equal to (—1)^“*’ times the sum 
of all {n—ryrowed principal minors of A. 

We note that for r = 0 this result reduces, effectively, to 
Theorem 7.1.1. 

We have (—l)^x{A) = |A—AI|. 

Writing the determinant on the right in the form 


«U— A 





®2n4-0 

o 

+ 

r-< 

®n2+0 • 

. . a ^^—A 


and using the corollary to Theorem 1.2.5 (p. 11 ), we obtain 

(-l)»x(A)=l-AIl+i 2 i>(h. h), 

fc-l 

where denotes the n-rowed determinant whose columns 

with sulSSxes *i,...,»* are identical with the corresponding columns 




198 THE CHARACTERISTIC EQUATION VH, § 7.1 

in A and whose remaining columns (if any exist) are identical with 
the corresponding columns in —AI. Hence, for 1 < A < », 

where is the &-rowed (principal) minor of A, obtained 

by deleting from A all rows and columns other than those with 
suffixes Thus 

(-1)«X(A) = (-1)"A'‘+ f 2 (-A)»-*'A(ii....,4), 

;c=l 

and so ^ 

X(A) = A»+ 2 (- 1)"A"-*' 2 ^ih . iu) 

A;=l 

/c—1 

where is the sum of all ^-rowed principal minors of A. The 
theorem now follows at once. 

Exercise 7.1.2. Let 1 < A: < n, 1 < < ... < 4 < Show that the 

coefficient of in 



012 

. 


Oji 

O 22 — ^2 

• 

• ®2n 

®ni 

On2 

. 

• ^nn \ 


is equal to (— 1 ... where denotes the (n— A*)-rowed (principal) 

minor of A obtained by deleting the rows and columns with suffixes 
Hence deduce Theorem 7.1.2. 

Theorem 7.1.3. Let 1 ^ r ^ n. Then the r-th elementary 
symmetric function^ of the characteristic roots of A is equal to the sum 
of all r-rowed principal minors of A. 

This result follows immediately from Theorem 7.1.2 and the 
well-known relations between the coefficients and the roots of an 
algebraic equation. 

Definition 7.1.3. The trace of a mairix A is the sum of its 
diagonal elements. It is denoted by tv JS,. 

Exercise 7.1.3. Show that the function trX is a linear operator in the 
linear manifold of n x n matrices. 

t The rth elementary symmetric function of the numbers (where 

1 r < n) is defined as 

T . 




VII, § 7.1 COEFFICIENTS OF CHARACTERISTIC POLYNOMIAL 199 

The special case r = 1 ofTheoremT.I.Sisworthstating explicitly. 

Theorem 7.1.4. The trace of a matrix is equal to the sum of its 
characteristic roots. 

Exercise 7.1.4. Write out an independent proof of Theorem 7.1.4. 

7.2. Characteristic polynomials and similarity transforma¬ 
tions 

The significance of similarity transformations has already been 
discussed in § 4.2. We shall now note some of their properties in 
relation to characteristic polynomials. 

Theorem 7.2.1. The characteristic polynomial, and therefore the 
characteristic roots, of a matrix are invariant under similarity trans¬ 
formations. 

For we have 

lAI-S-^ASI = |S-MAI^A)S| = |S;i| |AI~.A| \S\ = |AI-A|. 

Theorem 7.2.1, though proved in a single line, is important. It 
shows, in fact, that the characteristic roots are numbers which 
pertain not merely to a matrix but to the linear operator of which 
that matrix is a representation. We may consequently speak of 
characteristic roots of linear operators. 

Theorem 7.2.1 implies, in particular, that similar matrices have 
the same determinant. Moreover, in view of Theorem 7.1.4, we 
can make the following inference from Theorem 7.2.1. 

Corollary. The trace of a matrix is invariant under similarity 
transformations, i.e. 

tr(S-iAS) = tr A. 

Exercise 7.2.1. Show that a matrix and its transpose have the same 
characteristic equation. 

Theorem 7.2.2. The diagonal matrices dg(ai,..., a^) and 
dg(j3i,...,j8^) are similar if and only if is an arrangement 

Denote the two matrices by A and B respectively and suppose, 
in the first place, that they are similar. Their characteristic 
polynomials are (A—ai)...(A—a„) and (A—j3i)...(A—respec¬ 
tively; these pblynomials are, by Theorem 7.2.1, identical and 
hence is an arrangement of (j3i,...,j8„). 

Next, assume that is an arrangement of (j5i,...,j3„). 

Denote by E the matrix obtained when the ith and jth rows of the 



200 THE CHARACTERISTIC EQUATION VII, § 7.2 

unit matrix are interchanged. Then E == and it follows at 
once from Theorem 6.1.2 (p. 170) that E“^AE is the matrix obtained 
from A when the elements and ocj are interchanged.f Hence 
matrices E,..., G may be chosen such that 

G-i... E“lAE ... G = B, 

i.e. (E ... G)~iA(E ... G) = B, 

and A and B are therefore similar. 

Theorem 7.2.3. If A, B are arbitrary matrices, then AB and 
BA have the same characteristic polynomial. 

In proving this result (which is due to Sylvester) we shall employ 
an idea whose usefulness was already recognized in § 3.5. 

If |A| 9 ^: 0, then BA = A-^(AB)A. In that case AB, BA are 
similar matrices and the assertion follows by Theorem 7.2.1. In 
the general case we have | A—H | 0 for all sufficiently large values 
of t, say for t > t^. In view of the previous case it therefore follows 
that, for t > Iq, the matrices 

(A-<I)B, B(A-<I) 

have the same characteristic polynomial. Thus 

|AI-(A~H)B| = lAI-B(A-rt)| 

for all t > Iq and all A. For every fixed value of A we have, therefore, 
two polynomials in t which are equal whenever t > t^. Hence, by 
the corollary to Theorem 1.6.3 (p. 31), we have equality for t = 0, 

|AI-AB| = lAI-BAI 

for all values of A. 

The required result can be established more rapidly if we are 
prepared to make use of the notion of continuity. For, if c,.(X) 
denotes the coefficient of A** in the characteristic polynomial of X, 
then clearly 

c,(AB) = c,(BA) (r = 0, l„..,?i-l) (7.2.1) 

provided that A is non-singular. But c,.(X) is a continuous function 
of the elements of X, and hence (7.2.1) remains valid even when A 
is singular. ( 

t The reeuler who is not acquainted with the theory of ^-operations may 
regard the proof of this statement as an exercise. 

X For yet another proof see MacDuifee, 21, 23-24. 



VII, §7.2 SIMILARITY TRANSFORMATIONS 201 

Corollary. If Bi,..., is any cyclic arrangement'^ of the 
ordered set of matrices^ then Ai...A;j. and B 2 ...B;(. have 

the same characteristic 'polynomial. 

Exebcisb 7.2.2. (i) Deduce from Theorem 7.2.3 that 

tr(AB) = tr(BA), (7.2.2) 

and also give an independent proof of this result, (ii) Use (7.2.2) to deduce 
the corollary to Theorem 7.2.1. (hi) Extend (7.2.2) to products of any 
number of matrices. 

7.3. Characteristic roots of rational functions of matrices 

Polynomials and rational functions of a matrix A were defined 
in § 3.7. We shall now discuss the relation between the charac¬ 
teristic roots of such matrix functions and those of the original 
matrix A. The results derived below are due to Probenius (1878). 

Theorem 7.3.1. If f is a polynomial and are the charac- 

teristic roots of A then 

l/(A)l =/(Ai).../(A„). 

Denote the characteristic polynomial of A by x(A; A), and write 

f(t) = C(<Xi t).,,((Xj^ t). 

Since, as we know by Theorem 3.7.2 (p. 98), any identity between 
scalar polynomials implies the corresponding identity between 
matrix polynomials, we have 

/(A) = c(aJ~A)...KI-A). 

Hence 

i/(A)i = c" jnr = c’^jyr xK; a) 

= c" jT n =n cfr («<— 

t=ij»=i 1=1 j=i 

and the assertion is proved. 

Exercise 7.3.1. Show that Theorem 7.1.1 is a special case of Theorem 
7.3.1. 

Theorem 7 . 3 . 2 . The concl'usion of Theorem 7.3.1 is still valid if 
f is a rational function, say f = gjh, where g, h are polynomials such 
that |MA)1 ^ 0. 

t A cyclic arrangement of the ordered set of objects is any arrange¬ 

ment of the form 



vn, § 7.3 


202 THE CHARACTERISTIC EQUATION 

By Theorem 7.3.1, 

?(A) = g g{Xi). h(A) = h(\). 

Hence 

|/(A)| = WA)«A))-.| - j|<^ = = g/(A,). 

Theobem 7.3.3. If the characteristic roots of A are Aj,.-. A„ owi 
if^= fig, where f, g are polynomials stich that |sr(A) | ^ 0, then the 
characteristic roots of ^(A) are ^(Ai),...,^(A„). 

For any value of A write 


Since |g'(A)| ^ 0, Theorem 7.3.2 may be applied to ^(a:). This leads 
to the relation 


i.e. 


|AI-^(A)| = (A-^(Ai))...(A-^(A„)). 


This holds for all values of A, and the theorem now follows since 
the left-hand side represents the characteristic polynomial of ^(A). 


CoBOLLABY. If the characteristic roots of A are Aj,.“> A„ and 
k is an integer {positive if A is singviar), then the characteristic roots of 
A*are AJ,...,A^. 

Exercise 7.3.2. (i) Show that if, for some positive integer m. A”* = I, 
then all characteristic roots of A are roots of unity, (ii) Show, by consider¬ 
ing the matrix J j that the converse inference is false. 


7.4. The minimum polynomial and the theorem of Cayley 
and Hamilton 

7.4.1. Definition 7.4.1. A polynomial /annihilatbs the matrix 
A iff {A) = O. 

The existence of a non-zero polynomial annihilating a given 
matrix A is an immediate consequence of the fact that the set of all 
nxn matrices is a linear manifold of dimensionality »*.t This 
implies that the matrices I, A, A®,..., A"* are linearly depen¬ 
dent; thus there exist scalars Cq, c^, Cj c„i, not all zero, such that 

CoI+CiA-l-CgA*-!-...-fc„.A«‘ = O. (7.4.1) 

t See Exercise 3.3.2 (p. 80). 



vn, § 7.4 THE MINIMUM POLYNOMIAL 203 

Hence the non-zero polynomial 

/(A) = c„-f CiA-fCgA^-f ...4-c„,A»’ 

annihilates A. 

Instead of appealing to properties of linear manifolds we may 
also argue directly. The matrix equation (7.4.1) is equivalent to a 
system of homogeneous linear equations in the unknowns 

Co,Ciy...yC^t. Such a system possesses a non-trivial solution and 
hence the existence of a non-zero polynomial annihilating A is 
proved once again. 

Among all non-zero polynomials annihilating A we now consider 
those of least degree, and by multipljdng them by suitable non-zero 
constants we ensure that they are monic (i.e. they have their 
leading coefficients equal to 1). Any two such polynomials, say 
/i and/ 2 , are in fact identical. For otherwise ^ = /i —/2 is a non-zero 
polynomial of lower degree than/^ and/ 2 , 

=MA)-UA) = O, 

which contradicts the definition of/i, /g as non-zero annihilating 
polynomials of minimum degree. Thus/j = / 2 , and we may intro¬ 
duce the following definition. 

Definition 7.4.2. The minimum polynomial of a matrix A ia the 
(unique) monic polynomial of minimum degree which annihilates A. 

The notion of a minimum polynomial is due to Frobenius (1878). 
Our previous remarks show that the degree of the minimum poly¬ 
nomial of A is at most equal to n^. This crude bound will be 
improved in § 7.4,2. 

We already know that the characteristic polynomial is invariant 
under similarity transformations. The corresponding result will 
now be established for the minimum polynomial. 

Theorem 7.4.1. Similar matrices have the same minimum 
polynomial. 

If B = S’^AS, we can at once verify, by induction with respect 
to h, that gfc ^ S-1A*=S (h = 0, 1,2,...). 

It follows that, for every polynomial /, 

/(B) = S-y(A)S. 

Hence the same polynomials annihilate A and B, and therefore 
A and B have the same minimum polynomial. 



204 THE CHARACTERISTIC EQUATION VH, § 7.4 

Theorem 7.4.2. TAe minimum p<Aymmial of a matrix is a divisor 
of every polynomial which annihilates the matrix. 

Denote the minimum polynomial of A by /x(A), and let/(A) be a 
polynomial such that /(A) = O. By the division algorithm, there 
exist polynomials q{X), r(A) such that 

/(A) = q(X)p{X)+r{X), 

where r(A) is either identically equal to 0 or else is of lower degree 
than /t(A).t 

Hence /(A) = q{A)fi{A)+r{A), 

and so r(A) = 0. Thus, by the minimum property of p(X), it follows 
that r(A) vanishes identically; therefore /(A) = q{X)n(X), i.e. /x(A) 
is a divisor of/(A). 

The existence of a minimum polynomial for every matrix enables 
us to reduce rational functions of matrices to polynomial form. 

Theorem 7.4.3. ///(A), g{X) are scalar polynomials and g{A) is 
non-singular, then the quotient f{A)/g{A) is equal to a polynomial in 
A whose coefficients depend on f,g and A. 

It is sufficient to show that {gf(A)}~^ is equal to a polynomial in 
A. We write S = g{A) and denote by 

p{X) = A*+aiA*-i4-...+aj,_iA+aft 
the minimum polynomial of S. Hence 

S*4-aiS*-^+...+aft_iS+afcI = O, 
and so = _(Sfc-i+aiS*=-2+...+a*_iI). 

The assumption that = 0 would clearly contradict the definition 

of /i(A). Hence a* 7 ^= 0 and we have 

Thus {gr(A)}-^ is equal to a polynomial in S, and so in A. The 
coefficients of this last polynomial depend not only on g but also 
on 0 ^,..., «*_!, i.e. on the minimum polynomial of S = g{Ay, they 
depend, therefore, on A. 

The fact that the coefficients of the polynomial whose existence 
is asserted by Theorem 7.4.3 depend on A is important. It shows 
that Theorem 7.4.3 does not furnish us with an identity for convert¬ 
ing rational functions of matrices into polynomials. Thus, if 

t We associate no degree with the identically vanishing polynomial. 



THE MINIMUM POLYNOMIAL 


206 


Vn, § 7.4 

f{A)fg{A) = ^(A), where/, g, ^ are polynomials, it is not legitimate 
to conclude that/(B)/gr(B) = <^(B), for the coefficients of <f> depend 
on A as well as on/ and g. On the other hand, there exists, of course, 
a polynomial 0 such that f(B)/g(B) — 0(B). 

The possibility of expressing every rational function of a matrix 
as a polynomial marks a vital point of difference between matrix 
algebra and the algebra of numbers. 

7.4.2. We had observed in § 7.4.1 that the minimum polynomial 
of an 71X matrix is of degree not greater than n^. We shall now 
improve on this statement by showing that the degree of the 
minimum polynomial is, in fact, not greater than n. We shall 
establish this result by determining an annihilating polynomial 
of degree n. 

It will be necessary to consider polynomials in a scalar variable A 
which have matrix coefficients, and we shall always write each power 
of A to the right of the corresponding matrix coefficient. Let f(A) 
be such a polynomial, say 

f(A) = Co+GiA+...+C;,A^ (7.4.2) 

Then the matrix £(A) is defined by the equation 
f(A) = Cq+Cj A+...+C;fc A*^. 

Theorem 7.4.4. Let £(A), g(A) be 'polynomials in A with matrix 
coefficients] and suppose that they are related by the equation 

g(A) = f(A).(AI-A), 
where A is a given matrix. Then g(A) = O. 

This result, though very simple to prove, is not quite as trivial as 
it may look at first sight. It is not, of course, legitimate merely to 
‘substitute’ A = A in f(A).(AI—A), for we must multiply out the 
factors f(A) and AI—A and arrange the product as a polynomial in 
A before making the substitution. 

Let f(A) be given by (7.4.2). Then 

6(A)=:f(A).(AI-A) 

= —Cq A+(Co—Cl A)A-f-(Ci—C2 A)A^+...+ 

+(C,>i-~C,A)A^+C,A^^S 

and so 

6(A) = -GoA+(Co-CiA)A+(Ci-C2A)A2+...+ 

+(Gife-i-C,A)AHC, A^+i = O. 



206 THE CHABACTEBISTIC EQUATION VH, § 7.4 

We next come to our principal result. 

Theorem 7.4.5. (Theorem of Cayley and Hamilton) 

Every matrix satisfies its mm characteristic equation, i.e. if x(A) 
is the characteristic polynomial of A, then x(A) = O. 

This theorem was established by Hamilton for a special class of 
matrices in 1853. Five years later Cayley enunciated the general 
result without proof. 

Write 

x(A) = |AI—A] = A“4 "CiA”"^+—+c„_iA4-c„. 

The adjugate matrix of AI—A has the form 

/Pii(A) . . . ^)i„(A) 

(AI-A)*=. 

'i^nl(A) • • • i^nn(A) 

where the Pij{X) are polynomials in A. Clearly, then, (AI—A)* may 
be written as a polynomial in A having matrix coefficients. Now, by 
Theorem 3.5.2 (p. 88), 

IA»+CiIA»-i+...+c„-iIA+c„I = (AI-A)*(AI-A), 
and hence, by Theorem 7.4.4, 

A”+CiA”-i+...4-c„_iA+c„I = O, 
i.e. x(A) = O.t 

Exebcise 7.4.1. Discuss the fallacy involved in ‘proving’ the theorem of 
Cayley-Hamilton by substituting A = A in x(A) = |AI--A|. 

The manipulation of matrices is often greatly faciUtated by the 
Cayley-Hamilton theorem, which provides an easy method for 
expressing any polynomial in A as a polynomial of degree not 
exceeding n—1. For, if/(A) is any polynomial, then there exist 
pol 3 momials g(A), r(A) such that 

/(A) = 3 (A)x(A)+r(A), 

where r(A) is either identically zero or else is of degree less than the 
degree of x(A), i.e. less than «. Hence, in view of the Cayley- 
Hamilton theorem, 

/(A) = 3 (A)x(A)+r(A) = r(A). 

t For a less sophisticated version of the same proof see Ferrar, 2, 111-12. 




THE MINIMUM POLYNOMIAL 


207 


VII, § 7.4 

As an example, let us compute 2A®—41, where 
A is the matrix 

A = 



The characteristic polynomial of A is 


A-i 0 —2 

0 A+l —1 
0 -1 A 


= A®~2A+1. 


Now, long division of 2A®—4= by A®—2A+1 leaves 


the remainder 


r(A) = 24A2-37A+10. 


Hence 2A8-3A®+A^+A2-4I = 24A2-37A+10I. 
But 


and therefore 




/-3 

48 

—26 

2A8-3A«+A*+A2-4I = 0 

95 

—61 

\ 0 

-61 

34, 


Again, the Cayley-Hamilton theorem may be used to express 
rational functions of a given matrix A in polynomial form. This is 
achieved by carrying out the process explained in the proof of 
Theorem 7.4.3 but using the characteristic polynomial in place of 
the minimum polynomial. 

7,4.3, Theorems 7.4.2 and 7.4.5 lead at once to the following 
consequence. 

Theorem 7.4.6. The minimum polynomial of a rmtrix divides 
its characteristic polynomial. 

This theorem furnishes us with no information as to whether the 
minimum polynomial and the characteristic polynomial are equal 
or not. In fact, equality occurs in some but not in all cases. Thus, 
if A = dg(l, 0,-1), the minimum polynomial //.(A) and the charac¬ 
teristic polynomial xW ^^e both equal to A(A—l)(A-f 1). On the 
other hand, if 



208 THE CHARACTERISTIC EQUATION VII. § 7.4 

then fi{X) = A*, x(^) = Both these examples illustrate the 
following general result. 

Theorem 7.4.7. The distinct linear factors of the minimum 
polynomial coincide ivith those of the characteristic polynomial. 

Let be the distinct characteristic roots of A, and denote 

their multiplicities by respectively, so that ^ 1 

and — n. The characteristic poljmomial x(A) is then 

Sive. by ^ (A-A.)«...(A-A,)«, 

and so, by Theorem 7.4.6, the minimum polynomial fjL(X) must be of 
the form _ ^X-X^)PK..{X-X^)h, 

where 0 ^ j3^- < (i = 1 ,..., k). Assume now that some is equal 
to 0. Then fi{Xj) ^ 0, and so, by Theorem 7.3.3 (p. 202), the matrix 
fi(A) possesses at least one non-zero characteristic root. Hence 
/x(A) ^ O, contrary to hypothesis. Thus ^ 1, and the 

theorem is proved. 

Corollary. If the characteristic roots of a matrix are distinct, 
then its minimum polynomial and its characteristic polynomial are 
cjuaZ.f 

7.5. Estimates of characteristic roots 

In the present section we shall establish a number of results 
giving information about the position of characteristic roots in 
the complex plane. 

7.5.1. We have already met the notion of a symmetric matrix 
(Definition 6.4.2, p. 183). Another special class of matrices is 
specified by the next definition. 

Definition 7.5.1. The matrix A = is skew-symmetric 
*/»« = —««• (»*>« = i-c. »/= —A, 

Exebcisb 7.6.1. Show that all the diagonal elements of a skew-symmetric 
matrix are equal to zero. 

The notions of symmetry and skew-symmetry are generally only 
of interest when the matrix in question is real. For matrices over 
the complex field it is desirable to modify these notions by intro¬ 
ducing complex conjugates. If A = (o„) we shall, as previously, 
use the symbol A to denote the conjugate matrix (d„). 


t For further information on the minimum polynomial see MacDuflee, 8, 77-86. 



VII, §7.6 ESTIMATES OF CHARACTERISTIC ROOTS 209 

Exbbcisb 7.5.2. Show that (i) (A^) = (A)^; (ii) AB = Afi; (iii) 
(A*) = (A)*; (iv) (A-i) = (A)--i if lAl ^ 0. 

Definition 7.5.2. The {complex) matrix A = (a^^) is hbrmitian 
if (r, s == 1 ,..., n), i,e. i/ A = A^; it is skew-hermitian 

if ^rs = ^ == 71), t.C. i/ A == — A^. 

Hermitian matrices are so named after Hermite, who was the 
first to discuss their properties. 

For real matrices the terms ‘hermitian’ and ‘symmetric’ 
evidently coincide, as do the terms ‘skew-hermitian’ and ‘skew- 
symmetric’. 

Exercise 7.5.3. Show that a hermitian matrix all of whose elements are 
purely imaginary*!* is skew-symmetric. 

Exercise 7.5.4. Show that if A is skew-hermitian, then iA. is hermitian. 

Theorem 7.5.1. The characteristic roots of a hermitian matrix, 
and in particular of a real symmetric matrix, are real. 

For real symmetric matrices this result was established by 
Cauchy in 1829; the general theorem for hermitian matrices is due 
to Hermite (1855). 

Let A be a hermitian matrix and A any characteristic root of A. 
Then |AI—A| = 0 and, in view of Corollary 1 to Theorem 5.4.1 
(p. 149), there exists a (possibly complex) vector x 0 such that 
(AI—■A)x = 0. Thus Ax = Ax; and, premultiplying by x^, we 
obtain x^^Ax = Ax^x. (7.5.1) 

On the other hand, Ax = Ax implies x^A^ = Ax^. Since A is 
hermitian, A^ == A; and hence 

xT^Ax = Ax^x. (7.5.2) 

Comparing (7.5.1) and (7.5.2) we infer that (A—A)x^ = 0. But 
0, and therefore A—A = 0, i.e. A is real. 

Exercise 7.5.5. Write out the proof of the preceding theorem without 
making use of the notion of matrix multiplication. 

Corollary. The characteristic roots of a skew-hermitian matrix 
are purely imaginary. 

For let A be skew-hermitian. Then, by Exercise 7.5.4, iA is 
hermitian. If A is any characteristic root of A, then |fAI—tA| = 0 
and so, by Theorem 7.5.1, iA is real, i.e. A is purely imaginary. 

t A complex number is said to be purely imaginary if its real part is equal to 
zero. In particular, the complex number 0 is purely imaginary. 

5582 p 



210 THE CHAKACTERISTIC EQUATION VII, § 7.6 

The relation just proved was first noted (for real skew-symmetric 
matrices) by Clebsch in 1863. 

A quantitative result which contains Theorem 7.6.1 for the case 
of real matrices was proved by Bendixon in 1902. 

Theorem 7 .6.2. Let A = (a„) be a real nxn matrix and write 

oi= max \\a„—a„\. 

If A is any characteristic root of A, then 

|3mA| < oty/{n(»—1)/2}. 

The number a may be regarded as a measure of deviation of A 
from symmetry. A is evidently symmetric if and only if a = 0, and 
in that case the asserted inequality implies the reality of the 
characteristic roots. 

Let X = (ar^,..., a;„)^ be a non-zero vector such that Ax = Ax and 
assume, without loss of generality, that = 1. Then 

A = X^Ax 

and so A = (x^’Ax)^ = x^A^. 

Hence 2i3TnA = A—A = X^(A—A^)x, 

It is easily verified that the expression on the right-hand side is 

n 

equal to ^ {a„—a„)XyXg. Since the left-hand side is purely 

r,a=»l 

imaginary, it follows that 

n 

2i3ntA = 2 (««— 

r,«*l 

213mA| < a 2 \XfX^—XfXg\. (7.6.3) . 

r,«=*l 

r^a 

Now for any real non-negative numbers Pi,—,Piet have 

(2>l+...+^>fc)* < HpI+..^+pI).^ (7.6.4) 

Hence, since the right-hand side in (7.6.3) contains n(n—1) terms, 
we have „ 

4|3mA|2 < a*n(n-l) 2 (7-6.6) 

r,««l 

rT&a 

Sut ~ 

t This inequality is a special case of the inequality of Cauchy and Schwarz 
which was mentioned in the proof of Theorem 2.5.2 (p. 63). 



VII, §7.6 ESTIMATES OF CHARACTERISTIC ROOTS 211 

and so 

2 \x^Xg—X^X^\^ <22 \Xr\^ 2 2 2 

r,«=»l r=l «“1 r“l «=1 

= 2|xl21xP-2« = 2-2\t\\ 

n 

where t — Hence 

2 Irc^a;,—< 2, 

r.8=l 

and the theorem follows by (7.5.5). 

Other useful estimates were obtained by Hirsch in 1902. 
Theorem 7.5.3. If A = (a^^) is a complex nxn matrix and 
p = max la^^l, a = max \\a^s-^aj, 

l^r,s<n l<r,8<n 

T = max ilo„— 

l<r,s<n 

then every characteristic root X of A satisfies the inequalities 

|A| < np, |9ilAl < no, ISmAj < nr. 

The third inequality implies, of course, Theorem 7.5.1 and the 
second implies its corollary. All three inequalities wiU be sharpened 
in § 10,4.2. 

Let C = (c,.^) be any complex matrix, and suppose that |c„| ^ k 
(r,5 = l,...,n). If X is any complex unit vector, then 

Ix^’Cx] = I 2 < K 2 |a^rlla^»l = «( 2 l*rl)^ 

I r,8 = l ' r,s=*l ^r=l ' 

Hence, by (7.6.4), |x^Cx| < w/c 2 

r=l 

and so jx^’Cxj < n#e. (7.6.6) 

Now, by h 3 rpothesis, there exists a complex unit vector x such 
that Ax = Ax. Hence A = x^Ax and so, by (7.6.6), 

|A| = |x^Ax| < np. 

Again, A = x®'A^, and therefore 

2<RIA = A+A = xI’(A+A2’)x, 2t 3mA = A-A = x*’(A-Ar)x. 

Hence the remaining assertions again follow by (7.6.6). 



2i2 the characteristic EQUATION VH, § 7.6 

7.5.2. Theorem 7.5,4. // A = (a„) is a complex nxn matrix 
and n 

then every characteristic root X of A lies in at least one of the circles 
specified by the inequalities 

|2-afcfcl<pA! (*=1, 

Let X = («! .a;„)^ ^ 0 satisfy the equation Ax = Ax, so that 

n 

{X—a„)x^ = 2 «fg«« 

8 = 1 
s#r 


Denote by Xj^ the greatest, in modulus, among the numbers aJj,..., x„. 
Then x^^ 0 and we have 


lA-Ofcfcll®*! 


8 = 1 
8^fc 


^ka ^al 


< 2 {OksW^kl = pM- 


8 = 1 
a^k 


Hence |A—< /»*, as asserted. 


The theorem just proved shows, in particular, that if A = (o,g) 
is dominated by its diagonal elements, i.e. if 

> 2 l«fcsl {k=l,...,n), 

8 = 1 
a^k 

then each characteristic root A of A satisfies at least one of the 
equations {X-aj^^l < (k = l,...,w). 

This implies, of course, that no characteristic root of A is equal 
to zero, i.e. that A is non-singular —a result with which we are 
already familiar in view of Theorem 1.6.4 (p. 32). 

Exebcise 7.6.6. Show that if A = (a,.,) is dominated by its diagonal 
elements, and . „ v 

8 = min 10^*]- 2 |o*,| . 

f-l / 

8¥^k 

then (detA| > S». (7.6.7) 

The following lower estimate for |detA|, more precise than 
(7.6.7), was given by Ostrowski (1937). 



213 


Vn,§7.6 ESTIMATES OP CHARACTERISTIC ROOTS 

Theorem 7.6.6. Let A = (a„) be a complex nxn matrix which 
is dominated by its diagonal elements. If 

n 

dr = l«rr|— 2 i“r«l (r = 

S~1 

8¥=r 

then |detA| 

Consider the matrix B = (6^^) specified by the equations 
Ka^O'rJdr (r,S = 

Then det A = dji...d„detB. (7.6.8) 

If A is any characteristic root of B we have, by Theorem 7.6.4, 

< i 16^1 

8^k 

for some value of k. Hence 

lAl > Ififcfcl- i 16^,1 = 1. 

8 = 1 
8^k 

and so, by Theorem 7.1.1, [detBI > 1. The assertion now follows 
by (7.5.8). 

We conclude our series of estimates with another interesting 
consequence of Theorem 7.5.4. 

Theorem 7.5.6. For any complex nxn matrix A we have 

|det Aj < min(iZ^, (7^), 

where R — max i2^, G = max (7^, 

^k {^k) being the sum of the absolute values of the elements in the 
k-th row {k-th column) of A. 

If A is any characteristic root of A = (a^^), then, for a suitable 
value of ky we have 

8 = 1 
8^k 

Hence |A| ^ ^ i?, and so |det A| ^ J2“. This implies that 

IdetA] = IdetA^I < C", 
and the theorem therefore follows. 

t A still more precise inequality was stated, without proof, in § 1.6.4. 



214 THE CHARACTEBISTic EQUATION VII, § 7.6 

7.6. Characteristic vectors 

The determination of the united points of a coUineation leads, as 
we have seen in § 7.1.1, to a system of equations having the form 
Ax = Ax; and this, in turn, entails the investigation of the charac¬ 
teristic equation |AI—A| = 0. It is, however, the vector x rather 
than the scalar A that is needed in the geometrical problem; and 
the situation is similar in many other applications. Furthermore, 
the study of vectors of this type is indispensable in the further 
development of matrix theory. 

Definition 7.6.1. Let X be a characteristic root of A. Then any 
non-zero vector x which satisfies the relation 

(AI-A)x = 0 

(i.e. Ax = Ax) is called a charactebistic vector of A, and it is 
said to be associated with the characteristic root A. 

Since |AI—A| = 0, we know that characteristic vectors asso¬ 
ciated with A do, in fact, exist; and it is clear that the set of all these 
vectors, augmented by the zero vector, is a vector space. The 
problem of determining the dimensionality of this space (a problem 
which is equivalent to the determination of the rank of AI—A) is 
important but cannot be treated at all fully at this stage. We shall, 
however, obtain one interesting result, and then return to the 
topic in § 10.2. 

Theorem 7.6.1. (Rank-multiplicity theorem) 

For every nxn matrix A and every number w we have 

i?(coI-~A) > n—mJ^A),’\ (7.6.1) 

When a> is not a characteristic root of A, the theorem is true 
trivially. When 1 < fc < n and co is a Jfc-fold characteristic root of 
A, then the theorem asserts that there exist at most h linearly 
independent characteristic vectors associated with o). The equi¬ 
valence of the two ways of formulating the theorem is an immediate 
consequence of Theorem 5.4.2 (p. 149). 

The assertion is obviously true for A; = w and we may, therefore, 
assume that 1 < A; < n. By Theorem 7.3.3 (p. 202), 0 is a A;-fold 
characteristic root of col—A, and hence 

t The reader is reminded that stands for the multiplicity of <u as 

characteristic root of A. 



CHARACTERISTIC VECTORS 


216 


Vni § 7.6 


VILi 

yme 


ere ^ 0. But, by Theorem 7.1.2 (p. 197), (—is 
equal to the sum of certain (n—Jfc)-rowed minors of col— A. Hence 
at least one of these minors does not vanish; thus i2(coI— A) ^ n—A;, 
and (7.6.1) is proved. 


It is possible to give an alternative proof in which we make use of Theorem 
2.3.6 instead of Theorem 7.1.2. The assertion is trivial when i?(coI—A) = 0 
or n, and we may, therefore, assume that 0 < 5 < n, where 5 = n— R(col —A). 
By Theorem 5.4.2, s linearly independent vectors Xi,...,Xg can be found such 

(a.I-A)x4 = 0 (i = 

By the corollary to Theorem 2.3.5 (p. 54), there exist vectors Xg^i,...,x,j 
such that Xi,...,Xg,x,^i,...,x„ are linearly independent. Let X denote the 
(non-singular) matrix defined by the relations = x^ (i = l,...,n). We 
then have 

{X-^{a>l-A)X}^i - {X-^col-~A)}X^i = X-^aJl-A)K^ = 0 (i == 1. 5 ), 

and so the first s columns of X”^(a>I—A)X consist entirely of zeros. By 
Laplace’s expansion theorem (Theorem 1.4.4, p. 21) it now follows that the 
characteristic polynomial of X~^(ail'—A)X, and so of cal —A, has the form 
A*^(A), where i/j is a certain polynomial. Thus 0 is at least an 5-fold charac¬ 
teristic root of cal— A, i.e. ca is at least an 5-fold characteristic root of A. 
Hence k > 5 , and this implies (7.6.1). 


It should be noted that strict inequality in (7.6.1) may actually 
occur. Thus consider the case 


Here 



CO = 1, k = 2, 


coI-A 



and 1 = i?(coI—A) > n—k = 0. On the other hand, equality in 
(7.6.1) may also occur, as can be seen by considering the case 
A = Ig, m = 1, = 2. 

As an immediate consequence of Theorem 7.6.1 we have the 
following corollary. 


Corollary. 
matrix A, then 


If o) is a simple characteristic root of the nxn 
i?((oI-A) = 71.-1, 


i.e. the vector space consisting of the characteristic vectors of A 
associated with co, together with the zero vector^ has dimensionality 1. 

For, by Theorem 7.6.1, jB(coI— A) ^7t— 1 , and since is a 
characteristic root of A we cannot have JB(coI— A) = n. 



216 THE CHARACTEBISTIC EQUATION VII, § 7.6 

Exercise 7.6.1. Show that an n x n matrix A possesses at least n—i?(A) 
zero characteristic roots. 

From Theorem 7.1.2 we can also deduce the following useful 
consequence. 

Theorem 7.6.2. If A is a non-zero rmtrix such that 
jB(A) = n—mo(A), 
then A possesses a critical principal minor. 

If 0 is not a characteristic root of A, the assertion is obvious; and 
if 0 is an n-fold characteristic root of A, then A = O, contrary to 
hypothesis. We assume, therefore, that 0 is a ^-fold characteristic 
root of A, where 0 < < n. The characteristic polynomial of A 

then has the form where c^_j^ ^ 0. But 

(—is equal to the sum of (n—i)-rowed principal minors. 
Hence at least one of these minors does not vanish; and since 
n—k = i2(A), the theorem follows. 


PROBLEMS ON CHAPTER VII 


1 . If A is any rectangular matrix, show that A^A is symmetric and 
A^A hermitian. 

2. Is either symmetry or skew-symmetry preserved by similarity trans¬ 
formations ? 

3. Establish the following results, (i) The sum and the difference of two 
hermitian matrices are again hermitian matrices, (ii) The product of two 
hermitian matrices is hermitian if and only if the matrices commute, (iii) If 
A and B are hermitian, then so is AB + BA. 

4. Show that the determinant of every hermitian matrix is real. 

6 . Find the characteristic roots of the matrices 

. /cos^ —sin^\ .. /cosh^ sinh0\ 

^ \sm6 cos6 / Isinh^ cosh^/* 


(iii) 


/ 1/V3 (l+i)/V3\ 

\(l-i)/V3 -1/V3 r 


6 . Find the characteristic roots of the matrix 


where cu = 



7. Let o>i,, 
Show that 


,.,co„ be the characteristic roots of the matrix A = (a„). 

i;w?= i 

f-i f,f“i 


and also express 2 cojl in terms of the elements of A. 



VII 


PROBLEMS ON CHAPTER VII 


217 


8. Find the cube of /I —IS 6\ 

3 2 1. 

\6 22 -3/ 

9. Evaluate the matrix A®—25A2+112A, where 


/ 0 0 2 \ 

A = ( 2 10. 

\-l -1 3/ 

10. Evaluate the matrix A^—4A®—A^+2A—51, where A = 



11. Let A be an n X n matrix, A a characteristic root of A, and x a charac¬ 
teristic vector associated with A. Show that, for every positive integer 
A^x — A^x. Deduce that, if / is any rational function for which /(A) is 
defined, then/(A) is a characteristic root of /(A). 

12. Find the characteristic equation of the matrix 


( b c a\ 
c a 6 j, 
a h cj 


and prove that the matrices 


/c 

a 


/a 

b 


B = a 

h 

c). 

C = 6 

c 

aj 

\b 

c 

al 

\c 

a 

bl 


have the same characteristic equations as A. Show also that, if BC = CB, 
then at least two roots of this equation are equal to zero. 

13. If A = Q ”5). express (2A*-12A3+19A»-29A+37I)-i as a 
linear polynomial in A. 

14. If A “ ^ ^ express (A®4-5A®—48A*—1)“^ as a linear 

polynomial in A. 

15. The characteristic roots of the 3x3 matrix A are 1, — 1,2. Express 
A*” as a quadratic polynomial in A. 

16. Let /I 0 0\ 

A=(l 0 1). 

\o 1 0/ 

Show that, for every integer n > 3, A** = A’^'^+A*—I. Hence find A^®®. 

17. If tr(AX) = 0 for all matrices X, show that A = O. 

18. Find the characteristic roots of the nxn matrix A all of whose 
elements are equal to 1. Also find the minimum polynomial of A. 

19. Express the sum of the diagonal elements of A^A in terms of the 
elements of A. 

Show that, if A, B are real and symmetric and C is real and skew- 
symmetric, then A^+B® = G® implies A = B = G = O. Does this con¬ 
clusion still hold if A is not necessarily symmetric ? 

20. Give an easy proof of the theorem of Cayley and Hamilton for the 
case of matrices which are similar to diagonal matrices. 



218 


THE CHARACTERISTIC EQUATION VH 

21. Show that the following statements relating to an nxn matrix are 
equivalent, (i) A” = O; (ii) A* = O for some positive integer k\ (iii) all 
characteristic roots of A are equal to 0. 

22. Show that there exist no matrices A, B such that AB —BA = I. 

23. Show that the constant term in the minimum polynomial of A 
vanishes if and only if A is singular. 

24. Show that the zeros of the minimum polynomial of a diagonal matrix 
are distinct. Deduce that, if A is similar to a diagonal matrix, then the zeros 
of the minimum polynomial of A are distinct. 

26. Prove the identity 

3f(A!^B-BA2’)~Af(AB-BA) = tr^A^’A-AA^XB^’B-BB^)}, 

where Af(X) is defined by the equation M(X) = tr(X^’X). 

26. Suppose that all characteristic roots of I ■— A are loss than 1 in modulus. 
Prove that 0 < |det A| < 2*^; and show that this result is best possible. 

27. Let A = (a„) be a real nxn matrix and suppose that 

n 

> 2 l««l (»• = 

Show that all characteristic roots of A have positive real parts. 

28. A real matrix A has all its elements equal to zero except those of the 

form an, If, for each value of i, and ^ have the same 

sign, show that there exists a real diagonal matrix D such that D~^AD is 
symmetric. Deduce that all characteristic roots of A are real. 

29. Suppose that 0 is a simple characteristic root of A. Show that 
trA* ^ 0. Show also that, if [A] = 0 and trA* 0, then 0 is a simple 
characteristic root of A. 

30. Let the distinct characteristic roots of the nxn matrix A be Ai,..., Aj^. 

Show that ^ 

n(k-l) < 2 i2(AiI~A) < k(n-\), 

<-i 

31. The characteristic roots of an (n +1) x (n-j-1) matrix A are 0 and the 
nth roots of unity. Prove that 

2I~A = 

32. Find the characteristic polynomials of the matrices 


a, 

Os 

®n-l 

On 


-10 0 
0-1 0 
0 0-1 


0 0 0 
0 0 0 


. 0 0 \ 

. 0 0 

. 0 0 


. 0 -u 
. 0 0 / 


(i) 







PROBLEMS ON CHAPTER VII 


219 


vn 


(ii) 


(iii) 


(0 

0 

0 . 

. . 0 

0 

0 -a„ \ 

1 

0 

0 . 

. . 0 

0 

0 -0^1 

0 

1 

0 . 

. . 0 

0 

0 -On-2 

^0 

0 

0 . 

. . 0 

0 

1 —% J 


Oi 

1 

0 

0 . . 

. 0 

0\ 

02 

0 

1 

0 . . 

. 0 

0 


0 

0 

1 . . 

. 0 

0 

®n-l 

0 

0 

0 . . 

. 0 

1 

o„ 

0 

0 

0 . . 

. 0 

0/ 


33. Show that if — I, then every characteristic root of A is -f 1 or 1 . 
Let the 3x3 matrix A satisfy the equation A* = I 3 and suppose that all 

characteristic roots of A are equal to 1. Show that A = I 3 . 

34. Show that any real 2x2 matrix A which satisfies the equation 
A® — —I is similar to the matrix dg(t, —i). 

35. Show that, if A is a non-singular n x n matrix, then the coefficient of 
A in the characteristic polynomial ;((A) of A is equal to ( — 1 )^~^|A| tr(A"'^). 

36. Establish the identity 

|AI-A| = A3-AHrA-fAtrA*-|A| 
for every 3x3 matrix A. 

37. Show that, for any 3x3 matrices A and B, 

|AA-B| = A3|A|-AHr(A*B)-j-Atr(B*A)~|B|. 

Deduce that |A-fB| = |A|-f tr(A*B)+tr(B*A)+|B|. 

38. Let A" 4 -CiA"“^ 4 -“.+Cn_iA+c,i denote the characteristic polynomial 
of A. Deduce, from the Cayley-Hamilton theorem, that 


A* = (-~l)»»-HA"-i4-CiA»»-2+...-hc„_iI}. 

Also, derive the Cayley-Hamilton theorem from this identity. 

39. Let A be a matrix which satisfies the relation A* = I. Let U be the 
space of vectors x such that Ax = x and U' the space of vectors x such that 
Ax = — X. Show that U and U' are complements. 

40. Let cui,...,co;fc be a sot of distinct characteristic roots of the nxn 

matrix A and let, for 1 < i < fc, Ui denote the space of vectors x such that 
Ax == co^x. Show that, if x^-f ...-fx^^ = 0, whore x^eUi.then 

Xi = ... = x^j. == 0. 



and let x* Xi> Xz characteristic polynomials of A, Aj, Ag respectively 

and /X, /Lti, /Xg the minimum polynomials of A, Ai, Ag respectively. Show that 
X = Xi X 2 that /Lt is the least conunon multiple of /x^ and /Xg. Find the 
characteristic polynomial and the minimum polynomial of the matrix 
/6 -1 0 0 0 ^ 

6 0-1 00 

0 0 0 0 0 . 

0 0 0 3 1 

VO 0 0-1 1 , 










220 


THE CHARACTERISTIC EQUATION 


VII 


42. Let 



n 

1 

1 


1 


1 

(O 

CO* 



A = 

1 

CO* 

CO* 




ii 

co”-i 

^2(n-l) 


. to(n~l)(n~i) 


where cj = e**^*/”. By considering A^A, determine the value of |det A|. 

43. Let f{x) be a monic polynomial and let di,...,dn denote the roots of 
the equation f(x) — 0. Show that the discriminant A of this equation is 
given by ^ 

r=l 

Deduce that (detA)* = ( —where A denotes the matrix 
defined in the preceding question. 


44, Show that the matrix A defined in 

No. 42 satisfies 


fl 

0 

0 . 

. . 0 

0 

o\ 


0 

0 

0 . 

. . 0 

0 

1 

n-iA* = 

0 

0 

0 . 

. . 0 

1 

0 


0 

0 

1 . 

. . 0 

0 

0 


^0 

1 

0 . 

. . 0 

0 

0 / 

Hence obtain the expression 

for 

(detA)2 

given in 

the 

1 pr 


Also deduce that A^ is a scalar matrix and show that all characteristic roots 
of A are to be found among the numbers 

46. Suppose that the elements of A’" arc bounded as m -> oo. Show that 
the modulus of every characteristic root of A is less than or equal to 1. Show 
also that the converse inference is false. 

46. Let G be a 2 X 2 matrix; let (2/i>2/2) fwo row vectors; 

and write ^ 

Show that the three components of the row vector on the right of the equation 
(x^Vi, x^y^-Vx^yiy X2y^)T — > 72+^2 

are all linear combinations of the three components of the row vector on the 
left, so that the equation thereby defines the 3x3 matrix V. Obtain the 
relation ^ (trG)*-|Gl. 

If the characteristic roots of G are A and /x, find the characteristic roots of T; 
and deduce that |r| = |G|^. 

47. Show that the n X n matrix A = (a„) is non-singular if and only if 
there exists a matrix B = (bj.g) such that 

I n I n I n 1 

2 - 1 2 >0 (r = 1.n). 

jfc«l I I 

s^r 

Show, furthermore, that for any such matrix B, 

|detA||detB| > 

Obtain theorems arising from the special cases B = I and B = A^. 










VII 


PROBLEMS ON CHAPTER VII 


48. The characteristic roots of the matrix A are and 

[coil > |cofc| (k = 2,...,n). Show that 

lim {tr{A’^)}i/»» = coi- 

m—^co 

49 . Let A be a given n x n matrix. Show that, for every value of A, 

(A—AI)* = 0014-01 A4*...+^n-iA^”""^ 
where 0o, 0i,..., 0n-i polynomials in A. Find these polynomials when 

/ 0 1 ox 

A= -1 0 1. 

\ 0 -1 0 / 


60. A = (a„) is a complex nxn matrix; Pi,...,Pn ^^e positive numbers; 


max 


qrlbi4-»..4-|grn|Pn\ 


Show that every characteristic root oi of A satisfies the inequality |ctj| < K, 

61. Let Ai, be 3x3 matrices and suppose that, for all values of A 

and /i. (AAi+/xBi)* - A^As+A/^Cj+ft^B^, 

|AAi+/xBi| = A3A+AV0+Afi“©'+|n3A'. 

Show that C 2 Ai 4 -A 2 Bi ~ 01, C 2 B 1 + B 2 A 1 = 0T, 
and deduce that IC 2 I ■= 00'—AA'. 

Also show that, if (AAa+ftBj)* = A^Ag+A/tCg-f ju-^Bg, then 

A 2 C 3 +AB 2 A 1 = A0'I, BaCg + A'AgBi = A'0I. 

62. Show that, for any matrix A, any distinct numbers cui,...,cofc, and 
any polynomial f of degree < k—l. 


/(Ai - j/t”.) n (^) 


K^i 


Show also that this identity continues to hold for a polynomial / of any 
degree, provided that cui,...,aj* are taken as the distinct characteristic roots 
of A and provided further that the minimum polynomial of A is a product of 
distinct linear factors. (This result is known as Sylvester^s interpolation 
formula,) 

63. Show that jR(A) — li(A^) if and only if B(A) = n—mo(A). 

64. Show that R(A^) = i 2 (A”‘+^) = — ..., where m = mo(A). 



VIII 


ORTHOGONAL AND UNITARY MATRICES 


The first two sections of this chapter are devoted to the investiga¬ 
tion of two special classes of matrices, namely orthogonal matrices 
and unitary matrices. The remaining two sections deal with the 
use of orthogonal matrices in the algebraic manipulation of rota¬ 
tions in two and three dimensions. 


8.1. Orthogonal matrices 

8.1.1. Definition 8.1.1. A matrix A is orthogonal if it is 

A^A = I. (8.1.1) 


real and 


Equation (8.1.1) implies that A is non-singular. It may therefore 
be written as A~^ = A^, and hence also as 


AA^ = I. (8.1.2) 

By equating corresponding elements on both sides in (8.1.1) and 
(8.1.2), we can rewrite these relations in the more explicit form 


n 

2 ^kr^ka = Ka 
k=l 

(r,s = 

(8.1.3) 

n 

2 ^rk^ak ^ra 

(r,s = 

(8.1.4) 


where A = (a^^). 

Theorem 8.1.1. If A is orfhogonal, then |A| = ±1. 

For, by definition, A^A = I. Hence lA^HA] = 1, i.e. \A\^ = 1. 

Examples of orthogonal matrices with determinants 1 and — 1 
respectively are easily given. Two such matrices are Ig and 
dg(l,-l). 

Exercise 8.1.1. Show that if A == (a„) is an orthogonal matrix, then the 
(imique) solution of the system of equations 


is 


*1 = Ou6i+—+®«itn» 


*n = 





OBTHOGONAL MATRICES 


VIII, § 8.1 

Theorem 8.1.2. A real matrix ia orthogonal if and only if its 
columns {or rows) form an orthonormal act of vectors.^ 

A real matrix A = is orthogonal if and only if the relations 
(8.1.3) hold, and that is precisely the condition for the columns of 
A to form an orthonormal set. Similarly, (8.1.4) is precisely the 
condition for the rows of A to form an orthonormal set. 

By Theorem 8.1.2 it is, for instance, obvious that the matrix 

/1/V2 — 1/V2\ 

\1/V2 1/V2/ 

is orthogonal. 

Two immediate inferences from Theorem 8.1.2 are as follows. 

Corollary 1 . If the columns of a real matrix form an orthonormal 
set, then so do the rows ; and conversely. 

Corollary 2. If the order of the colurrins {or rows) of an orthogonal 
matrix is changed, then the resulting matrix is again orthogonal. 

Exercise 8.1.2. Deduce Theorem 8.1.2 from Definition 8.1.1 and 
Exercise 3.3.3 (p. 80). 

Exercise 8.1.3. Show that if any rows (or columns) of an orthogonal 
matrix are multiplied by — 1, then the resulting matrix is again orthogonal. 

Exercise 8.1.4, Let /A A' 



be an orthogonal matrix with positive determinant. Using the equation 
= T“^, show that 

A = /Lt = v'A"-v"A', V = A>"-AV- 

Theorem 8.1.3. Let x^be a real unit vector. Then there exists an 
orthogonal matrix A having as its first column. Furthermore, the 
sign of \A\ can be chosen at will. 

In view of the theorem on orthonormal bases (Theorem 2.5.5, 
p. 66) there exist vectors X 2 ,...,x^ such that Xi,X 2 ,.--,x^ is an 
orthonormal set. The matrix A having Xi,X 2 ,...,x^ as its columns 
is then orthogonal by Theorem 8.1.2. If necessary, we can adjust 
the sign of |A| by replacing Xg by —Xg. 

t In view of this result, the term ‘orthonormal matrix’ might be regarded as 
more appropriate than ‘orthogonal matrix’, but custom has firmly established 
the latter usage. 



224 


OKTHOGONAL AND UNITARY MATRICES VIII, §8.1 

Theorem 8.1.4. If A arid B are orthogorial matrices, then so 
are A^, A“^, and AB. 

Since A is orthogonal, AA^ = I. Hence 

(AT)Tat == I, 

and so AF is orthogonal. Moreover, A“^ = AF and so A“^ is 
orthogonal. Again 

(AB)^AB = B^A^AB = B^B = I, 
and therefore AB is orthogonal. 

An important property of orthogonal matrices can be formulated 
in terms of linear substitutions. 

Theorem 8.1.5. Let x= y= ^ 

matrix A is then orthogonal if and only if the substitution x = Ay 
transforms the polynomial x\-\-...-\-xl^ into 2/i+-*-+2/n- 

The substitution x = Ay transforms a;f4---+^n = into 
(Ay)^Ay = y^A^Ay. Now if A is orthogonal, this pol 3 niomial is 
clearly equal to y^y = yf+.-.+^n- On the other hand, if the 
polynomials y^A^Ay and y^y are equal, then A^A = I, i.e. A is 
orthogonal. The theorem is therefore proved. 

Consider, for example, the substitution x = Ay, where A is the 
2x2 matrix given on p. 223. Then 

*1 = (yi-y2)/^2, *2 = (yi+y2)/^2, 

and it can be verified at once that this substitution transforms 
xl+xl into y\-\-yl. 

Theorem 8.1.6. A real matrix A is orthogonal if and only if it 
preserves length, i.e. if and only if 

|Ax| = lx| (8.1.6) 

for every real vedor x. 

Equation (8.1.6) is equivalent to 

(Ax, Ax) = (x,x), (8.1.6) 

i.e. x2’(A*’A-I)x = 0. (8.1.7) 

If A is orthogonal, this relation clearly holds for all x. Suppose, on 
the other hand, that (8.1.7), and so (8.1.6), holds for all x. Writing 
B = (6„) = A^A—I and putting x = e* in (8.1.7) we obtain, by 
Exercise 3.3.9 (p. 85), 

6*fc==0 (* = l,...,n). 


( 8 . 1 . 8 ) 



ORTHOGONAL MATRICES 


225 


VIII, { 8.1 

Next, let 7 *^: Z and putx = (8.1.7). Again using Exercise 

3.3.9 we obtain 

^ (&, Z = l,...,n; k ^ Z). 

Hence, in view of (8.1.8) and the obvious symmetry of B, 

6;^; = 0 {Jc,l = \y...yn) k ^l). (8.1.9) 

By (8.1.8) and (8.1.9) it follows that B = O, i.e. A is orthogonal 
This completes the proof. 

Corollary 1. A real matrix A is orthogonal if and only if it 
preserves separation i.e, if and only if 

|Ax—Ay| = |x—y| (8.1.10) 

for all real vectors x, y. 

If A is orthogonal, then, by (8.1.6), * 

|Ax-Ay| = iA(x-y)| = |x-y|. 

On the other hand, if (8.1.10) holds for all real x, y, then (8.1.5) holds 
a fortiori, and A is orthogonal. 

Exekcise 8.1.6. Give a geometrical interpretation of Corollary 1 for the 
case n = 3. 

Corollary 2. A real matrix A is orthogonal if and only if it 
preserves inner products, i.e, if and only if 

(Ax, Ay) = (x,y) (8.1.11) 

for all real vectors x, y. 

If A is orthogonal, then 

(Ax, Ay) = (Ay)^Ax = y^A^Ax = y^x = (x,y). 

On the other hand, if (8.1.11) holds for all real x, y, then (8.1.6), 
and so (8.1.6), holds a fortiori. Hence A is orthogonal. 

8.1.2. The next series of results is concerned with the values of 
characteristic roots of orthogonal matrices. 

Theorem 8.1.7. If A is a characteristic root of the orthogonal 
matrix A, then so is A""^. 

It should be noted that, since \A\ ^ 0, none of the characteristic 
roots of A is equal to zero. 

5582 


Q 



226 ORTHOGONAL AND UNITARY MATRICES Vni, § 8.1 

By Theorem 7.3.3 (p. 202), we know that is a characteristic 
root of A“^, and so of A^. Hence 

0 = lA-H-A^I = 1(A-H-A)*’| = lA-H-A], 
and therefore A-^ is a characteristic root of A. 

The next theorem we prove is due to Briochi (1854). 

Theorem 8.1.8. Every characteristic root of an orthogonal matrix 
has unit modulus. 

Let A be a characteristic root of the orthogonal matrix A, and let 
X be a non-zero vector such that 

Ax = Ax. (8.1.12) 

Then Ax = Ax, and so 

Axi" - x^’A*’. (8.1.13) 

By (8.1.12) and (8.1.13) we obtain 

AAx^’x = x^A^Ax = x^x. 

Since x% ^ 0, this implies that |A|* = AA = 1, i.e. 1A| = 1. 

Exercise 8.1.6. Give a proof of the above theorem without making use 
of matrix notation. 

As an immediate consequence of Theorem 8.1.8 we have the 
following result. 

Corollary. The noU’real chardcteristic roots of an orthogonal 
matrix occur in conjugate pairs of the type where 0 < a < tt. 

We may also note some further relations concerning characteristic 
roots. 

Theorem 8.1.9. Let A be an orthogonal nxn matrix, (i) If 
|A| = 1 and n is odd or if |A| = —1 and n is even, then 1 is a 
characteristic root of A. (ii) // [A] = —1, then --lisa characteristic 
root of A. 

We have A^^l-A) = A^~I = , 

|A|.1I-A| = (-1)»|I-A|, 
|I-A|{|A|-(-l)»} = 0. 


and therefore 



VIII, §8.1 ORTHOGONAL MATRICES 227 

Hence (i) follows. Again, 

A2^(I+A) = A^+I = (I+A)^, 

|A|.|I+A|= |I+A|, 

and (ii) follows. 

8.1.3. In § 8.1.1 we obtained a number of conditions necessary 
and sufficient to ensure that a matrix should be orthogonal. These 
conditions do not lead to a convenient method for constructing 
orthogonal matrices, but the next theorem (discovered by Cayley 
in 1846) provides us with such a method. 

Theorem 8 .1.10. If S is a real skew^symmetric matrix^ then 
I+S is non-singular, and the matrix 

A = (I-~S)(I+S)-i 

is orthogonal. 

By the corollary to Theorem 7.5.1 (p. 209), all characteristic 
roots of S are purely imaginary, and therefore |I+S| 0. 

Since = — S we have 

(I-~-S)^ = I+S, (I+S)^ == I-S, 

and so, using equation (3.7.7) on p. 99, we obtain 

A^ = {(1-~S)(I+S)-+^ = {(I+S)~ini-S)^ 

= {(I+S)^}-MI+S) = (I-S)-MI+S). 

Hence A^A = (I+S)(I~S)-HI~S)(I+S)“i = I, 
and the theorem is proved.f 

Exercise 8.1.7. Obtain an orthogonal matrix of order 3 involving 3 
independent parameters. 

8.1.4. We know, by Corollary 1 to Theorem 8.1.6 that every 
transformation x' = Ax, where A is an orthogonal matrix, 
preserves separation. We shall show next that the converse state¬ 
ment is also, in a sense, true. 

t For further results bearing on the relation between orthogonal and skew- 
symmetric matrices, see Ferrar, 2, 164-7. For a generalization of Theorem 
8.1.10, see Turnbull, 18, 168-9. 



228 ORTHOGONAL AND UNITARY MATRICES Vni,§8.1 

Theorem 8.1.11. Let f he a transformation of the real total vector 
apace into itself. If _ q 

and, for all x, y e 

l/(x)-/(y)| = |x^y|, 

then /(x) = Ax, where A is an orthogonal matrix. 

To prove the theorem we begin by showing that / preserves 
inner products. For, by (8.1.14) and (8.1.16), we have 

|/(x)l = |/(x)-/(0)| = |x| 
for all X e and so 

(/(x),/(x)) == (x,x). (8.1.16) 

Again, by (8.1.15) we have, for all x, y e 

(/(x)-/(y)./(x)-/(y)) = (x-y,x-y). 

Therefore, in view of (8.1.16), 

(/(x),/{y)) = (x,y) (8.1.17) 

for all X, y 6 In particular 

(/{e<),/(ey)) = (e<,ey) = 8^^ (i, j = l,...,n), 

and the vectors/(ei),.--> /(®n) form an orthonormal set. Hence 

the matrix A, defined by the relations = /(e^) (i == 
is orthogonal by Theorem 8.1.2. Furthermore, 

/(e,) = Ae, (i= (8.1.18) 

Now, if the vectors Uj,..., constitute an orthonormal set, then, 

by Exercise 2.5.4 (p. 66), any vector v can be expressed in the form 

V = i (»<.V)U4. 

1-1 

Hence, by (8.1.17) and (8.1.18), 

m = .i (/(ei),/(x))/(ei) = .i (e*,x)Ae, 

i=l 

= A ^^(e<,x)e< = Ax, 
and the assertion is therefore established. 


(8.1.14) 

(8.1.16) 


Theorem 8.1.11 states that a transformation which preserves 
separation and leaves the zero vector invariant is an orthogonal 



ORTHOGONAL MATRICES 


229 


VIII, § 8.1 

transformation. By dispensing with the second requirement we 
are led to the following modified statement. 

Corollary. Let f be a tranaforTnation of the real total vector space 
into itself. If, for all x,y 6 93^, |/(x)—/(y)| = |x~yl, then 
f(x) = Ax+c, where A is an orthogonal matrix and c a fixed vector. 

Putting gf(x) = /(x)--/(0), we see that gr(0) = 0 and 
|gr(x)— gr(y)| = |x—y| for all realx, y. Hence, by Theorem 8.1.11, 
gr(x) = Ax, where A is an orthogonal matrix, i.e. 

/(X) = Ax+/(0). 

The result just proved has a simple geometrical interpretation 
for the case n = 3. Taking SSa to represent ordinary three-dimen¬ 
sional space we can infer from the corollary, the remark following 
Definition 2.5.2 (p. 63), and Theorems 8.4.7 and 8.4.13 below, that a 
transformation of space which preserves distance is either the 
product of a rotation and a translation or else the product of a 
rotation, a reflection, and a translation.f 

8.2. Unitary matrices 

In considering complex matrices, it is desirable to generalize the 
notion of orthogonality. 

Definition 8.2.1. The complex matrix U is unitary if 

U^U = I. (8.2.1) 

Thus a real unitary matrix is simply an orthogonal matrix. 
Unitary matrices stand in roughly the same relation to orthogonal 
matrices as hermitian matrices to real symmetric ones. 

The defining relation (8.2.1) may, of course, be restated in many 
alternative ways, such as = U"^, U = (U^)"^, and UU^ = I. 
A simple example of a unitary matrix is given by 

/(1-f i)/2 (—1-f i)/2\ 

\(l+i)l2 (l-~i)/2;- 

Exercise 8.2.1. Show that if A is hermitian and U imitary, then 
U-^AU is hermitian. 

The theory of unitary matrices closely resembles that of ortho¬ 
gonal matrices. We shall, therefore, deal with it rather summarily 
and in many cases leave the details to the reader. 

t We recall that by the product of a number of transformations we mean their 
resultant. 



230 ORTHOGONAL AND UNITARY MATRICES VIII, §8.2 

Theorem 8.2.1. If IJ is unitary, then |det U| = 1. 

Theorem 8.2.2. If U, V are unitary, then so are TJ, U^, U-^ 
and UV. 

These results follow immediately from (8.2.1). The next theorem 
involves the notion of inner product of complex vectors, introduced 
in Definition 2.5.1 (p. 62). 

Theorem 8.2.3. A {complex) matrix is unitary if and only if its 
columns {or rows) form an orthonormal set of vectors. 

The defining relation (8.2.1) is equivalent to 

(U^U)„ = 8„ {r,s = l,...,n), 
and this means that 

~ = (U3|cr)^Uj|55 = (r, 5 = 

In other words, the columns of U form an orthonormal set. Again, 
(8.2.1) is equivalent to UU^ = I, i.e. 

8„ = {VU^)„ = U,„(U^),^ = {r,s = l,...,n). 

Since are row vectors, this means 

(Ur*,Ug*) = 8„ {r,s = l,...,n), 
i.e. the rows of U form an orthonormal set. 

Exebcise 8 .2.2. If X is a unitary (orthogonal) matrix of order n—1, 
show that / j 

^Oi_i X ) 

is a unitary (orthogonal) matrix of order n. 

Exebcise 8.2.3. Show that Corollaries 1 and 2 to Theorem 8.1.2. remain 
valid if the term ‘orthogonal’ is replaced by ‘imitary’. 

Theorem 8.2.4. Let be a (complex) unit vector. Then there 
exists a unitary matrix having as its first column. 

This theorem is proved in the same way as Theorem 8.1.3. 

Theorem 8.2.6. Let x = (a;i,..., a;„)=f, y = {y^,..., y„)2'. A 
{complex) matrix U is unitary if and only if the substitution x = Uy 
transforms the expression XiXi+...-l-x„x„ into yiyi+“-+y»yn* 

The proof is immediate. 

Theorem 8.2.6. A {complex) matrix U is unitary if and only if 

|Ux| = jx] (8.2.2) 


for every complex vector x. 



UNITARY MATRICES 


231 


vm, § 8.2 
Equation (8.2.2) is equivalent to 

x2’(U^U-I)x = 0. (8.2.3) 

If U is unitary, then (8.2.3) is clearly satisfied for all x. Suppose, 
on the other hand, that (8.2.3) is satisfied for all x. Writing 
V = {Vfg) = U^U—I and putting x = e*. in (8.2.3), we obtain 

Vkk = (> (8.2.4) 

Next, let & # Z and put x = ejj.+e, in (8.2.3). Making use of (8.2.4) 
we then obtain 

%+% = 0 {k,l = k ^ 1). 

Similarly, putting x = ej.+ie,, we obtain 

% = 0 (k,l = k ^ 1). 

Hence V = O, and U is unitary. 

CoEOLLARY 1. A (complex) matrix U is unitary if and only if 
|Ux-Uyl = lx-y| 
for all complex vectors x, y. 

Corollary 2 . A (complex) matrix U is unitary if and only if 
(Ux,Uy) = (x,y) 
for all complex vectors x, y. 

Theorem 8.2.7. If X is a characteristic root of the unitary matrix 
U, then so is 1/A. 

For 1/A is a characteristic root of U“^ = and therefore 
0 = 1(1/A)I-U*’| = 1(1/A)I-U2’| = |(1/A)I-U|. 

The following generalization of Theorem 8.1.8 is due to Frobenius 
(1883). 

Theorem 8 .2.8. Every characteristic root of a unitary matrix has 
unit modulus. 

Let A be a characteristic root of the unitary matrix U, and let 
the vector X ©satisfy the equation Ax = Ux. ThenAx^ = x^U^, 

AAic^x = x2’U2’Ux = x*bi. 

Hence |A|® = 1, and the assertion follows. Alternatively, it may 
be established by using Theorem 8.2.6 and noting that 

|x| = |Uxl = |Ax| = lA||x|. 



232 ORTHOGONAL AND UNITARY MATRICES VIII, §8.2 

The reader will have observed the similarity between the proofs 
of Theorems 8.2.8 and 7.5.1 (p. 209). These theorems are, in fact, 
included in the following more general result. 

Theorem 8.2.9. If the matrix A satisfies the equation 

2 = O, 

v=X 

and if A is any characteristic root of A, then 

= 0 , 

v=l 

Let X be a characteristic vector of A, associated with A, so that 
Ax = Ax. Then, for I, 

A*x = A*'-iAx = AA*'-^x, 

and therefore A^x = A*x (i ^ 0). 

This implies that x^(A*’)*' = A*x^ (Jc'^O). 

Hence x^(A^)''A*x = A’’A*x^x, 

and so « . jv 

0 = x**! 2 c„(A^)’'‘'A®'|x = 2 c„x^{A^)’^A»>'x 

= (2 c„A'^A»-)xrx. 

'y=l * 

The assertion now follows since x^ 0. 

We may restate Theorem 8.2.9 in a convenient manner by 
saying that if f{x,y) is a polynomial in (the non-commuting 
variables) x and y, and fiK’’', A) = O, then /(A, A) = 0. If we take 
f{x,y) = xy—1, we obtain the result on the characteristic roots of 
unitary matrices (Theorem 8.2.8), while the choice f{x,y) = x—y 
leads to the result on hermitian matrices (Theorem 7.6.1), and the 
choice /(r, y) — x-{-y to that on skew-hermitian matrices (corollary 
to Theorem 7.6.1). 

Theorem 8.1.10 has an obvious analogue for unitary matrices. 

Thborem 8.2.10. If S is a skew-hermitian matrix, then I-fS is 
non-singular, and the matrix (I—S)(I-1-S)-^ is unitary. 

Exebcise 8.2.4. Write out a proof of this theorem. 



VIII. § 8.3 


ROTATIONS IN THE PLANE 


8.3. Rotations In the plane 

The theory of orthogonal matrices owes its interest, in the first 
place, to the part it plays in the study of rotations in the plane and 
in space. Its relevance to this problem is not surprising, for 
orthogonal substitutions preserve separation and may therefore be 
expected to occur in the analysis of rigid motion. 

In this section and the next we shall discuss in some detail the 
relation between orthogonal matrices and rotations.t We shall 
find it necessary to distinguish between orthogonal matrices with 
positive and negative determinants. 

Definition 8.3.1. The orthogonal matrix A is proper or 
IMPROPER according as |A| = 1 or |A| = —1. 

Throughout the present section all matrices will be assumed to 
be of type 2x2. We shall choose some point 0 in the plane and take 
it as the origin of a system of rectangular coordinates, which will 
then be kept fixed throughout the section. A vector x = (x^y)^ 
will be said to represent the point P(x^ y) with respect to the given 
coordinate system. When we speak of a transformation x' = /(x) 
we shall understand a transformation which changes the point 
represented by x into that represented by/(x). The term ‘rotation* 
will be taken to mean a rotation of the plane about the origin. As 
usual, a rotation will be reckoned as positive if its sense is counter¬ 
clockwise. 

Theorem 8.3.1. Any rotation of the plane can he represented hy 
a proper orthogonal matrix. 

Stated more explicitly this means that there exists a proper 
orthogonal matrix A such that, if P is any point and P' the point 
into which P is carried by the given rotation, then x' = Ax, where 
X, x' represent P, P' respectively. 

Let the plane be rotated through an angle a in a counterclockwise 
sense, while the coordinate axes remain fixed. If the point (a;, y) is 
carried into position (a;',2/')> then, as is well known, 

X* = a; cos a—y sin a, 

y' = a;sina-fi/cosa. 

t A much fuller accoxmt of two-dimensional and three-dimensional rigid motion 
will bo found in Schreier and Spemer, 4, 163-78. See also Schwerdtfeger, 5, 
237-44. 



234 ORTHOGONAL AND UNITARY MATRICES Vm,§8.3 

In matrix form these relations may be written as 

( a:'\ _ /cos a —8ina\/a:\ 
y'j ~ \sina cosaj\y/’ 

i.e. x' = Ax, 

where x = x' = {x',y’)^, and A is a proper orthogonal 

matrix. 

To establish the converse of the theorem just proved we need a 
simple result on proper orthogonal matrices. 

Theobem 8 .3.2. Let A. be a froper orthogonal matrix. Then there 
exists a unique angle a such that 0 ^ a < 27r and 

A=fcos« -sina\ 

\sin <x cos ol] 

We write A = ^ 

Since = Ij = 1, there exist angles a, j3 such that 

p = cos a, q = —sin a, 0 < a < 2tt\ 
r = sinjS, s = cosjS, 0 < jS < 27r. 

But 1 = |A| = p8—qr == cos(a—^), and so a—is an integral 
multiple of 27 t. Hence jS = a, and A is of the form (8.3.1) with 
0 < a < 27r. Moreover, this representation of A is obviously 
unique. 

Exercise 8.3.1. Show that, if A is a proper orthogonal matrix, then 
x'' = Ax represents a rotation. 

Theoeem 8.3.3. For every proper orthogonal matrix A (here 
exists a unique angle a such that 0 < a < tt and the characteristic roots 
of A are c“^®. The equation x' = Ax then represents a rotation 
either through <x or through 277-—a. 

By Theorem 8.1.8 we know that the characteristic roots of A 
have unit modulus, and it is obvious that they are either both real 
or else are conjugate complex numbers. Furthermore, their product 
is equal to |A| = 1, and so they cannot be 1 and —1. They may 
therefore be taken as e^®, C”^'®, where 0 < a < 27r. It is, however 
possible to restrict a to the range 0 ^ a ^ tt, for, since 

C^2ir-a) ^ g-i(27r-.a) 



ROTATIONS IN THE PLANE 


235 


VIII, § 8.3 

we can replace a by 277—a whenever tt < a < 277. The characteris¬ 
tic roots of A are thus where 0 ^ a < 77, and the value of a 

satisfying these requirements is plainly unique. The first part of 
the theorem is therefore proved. 

By Theorem 8.3.2, A may be written in the form 

A __ /cosjS —sinj8\ 

\sin/3 cosjS/’ 

where 0 ^ < 277 . The characteristic roots of A are therefore 

and hence or i.e. )S = a or ^ = 277 —(x. 

Hence the equation x' = Ax, which can be written as 

X* = a: cos sin)?, 
y' — xsinjS+ycosjS, 

represents a rotation either through a or through 277 —a. 


Finally, we give a geometrical interpretation of improper 
orthogonal matrices. 

Theorem 8.3.4. Let A be an improper orthogonal matrix. Then 
the equation x' = Ax represents the product of a rotation and a 
reflection in a line through the origin. 


Write 



so that A = ^ 

Then B is a proper orthogonal matrix. The transformation 
x' = Ax is the product (in that order) of the transformations 



The second of these is, by Theorem 8.3.3, a rotation. The first may 
be written as x' = a;, y' = —y, and is therefore a reflection in the 
a:-axis. Thus the transformation x' == Ax is the product (in that 
order) of a reflection and a rotation. By considering the matrix 



in place of B, we see that x' = Ax is equally the product of a 
rotation and a reflection. 



236 


ORTHOGONAL AND UNITARY MATRICES VIII, §8.3 

It is interesting to observe that (in view of Theorem 8.3,3) the 
characteristic roots of a proper orthogonal matrix virtually deter¬ 
mine its geometrical significance. This is no longer the case for 
improper orthogonal matrices, since the characteristic roots of any 
such matrix are 1 and —1. 

Exercise 8.3.2. Prove the last statement. 


8.4. Rotations in space 

In this section all matrices are assumed to be of type 3x3. 
We shall find it convenient to write 


/I 0 O' 
= j 0 cos a —sin a 
\o sin a cos a ^ 


8.4.1, We begin with a number of purely algebraic results 
concerning properties of proper orthogonal matrices. 

Theorem 8.4.1. For every proper orthogonal matrix A there 
exists a unique angle oc such that 0 ^ oc ^tt and the characteristic roots 
of A are 1, 

By Theorem 8.1.9 (i) (p. 226) at least one characteristic root of 
A is equal to 1. The remaining two roots are, of course, of unit 
modulus and are either both real or else are conjugate complex 
numbers. Since |A| = 1, the characteristic roots of A cannot be 
1, 1, —1. Hence they are 1, where it may, of course, be 

assumed that 0 a < 27r. The angle a can now be made unique 
by being restricted to the range 0 < a < Tr.f 


Definition 8.4.1. The unique angle a of the preceding theorem 
will be called the angle of the {proper orthogonal) matrix A. 

Theorem 8.4.2. If A is a proper orthogonal matrix with angle 
a, 5 is a unit characteristic vector of A associated with the characteristic 
root 1, and T is any proper orthogonal matrix with % as its first columnX 
then ^ ^ TR.T-1 


-a. 


where j8 = a or 2i7- 

We have, by hypothesis,§ 


Te, = I 


t Cf. the proof of Theorem 8.3.3. 

X Such a matrix exists by Theorem 8.1.3 (p. 223). 

§ We write, of course, ei = (1, 0,0)^, e, == (0,1, 0)^, e, = (0, 0,1)^. 



237 


VIII, § 8.4 ROTATIONS IN SPACE 

Putting B = {b^g) = T~^AT, we obtain 
Bej = T-iATei = T“iA§ = 

and, since Be^ is simply the first column of B, this means that 

( 1 ^12 ^ 13 \ 

0 ^22 ^ 231 • 

^ ^32 ^ 33 / 

Now B is, by definition, a (proper) orthogonal matrix. Hence 

1+^12+^13 = 

and so 612 = ^13 = We therefore have 

/I 0 0 \ 

B — I 0 622 ^23 I. 

\o 632 633 / 

The submatrix 

must again be proper orthogonal and so, by Theorem 8.3.2, there 
exists an angle )3 such that 0 < jS < 27r and B = R^, i.e. 
A = TR^T“^. Now the characteristic roots of R^ are 1, 
while those of A arc 1, But, by Theorem 7.2.1 (p. 199), 

Rj 3 and A have the same characteristic roots. Hence jS = a or 
277—a, and the theorem is proved. 

Definition 8.4.2. If A is a proper orthogonal matrix, any unit 
vector 5 satisfying the equation A^ = % will be called a principal 
VECTOR of A. 

Theorem 8.4.3. If A is a proper orthogonal ^matrix other than I, 
then its principal vector is uniquely determined to within a scalar 
multiple ±1. 

In view of the corollary to the rank-multiplicity theorem 
(Theorem 7.6.1, p. 214), it suffices to show that 1 is a simple charac¬ 
teristic root of A. Denote the angle of A by a. If a = 0, then the 
angle j3 of Theorem 8.4.2 is 0 or 27r, and then A = I contrary to 
hypothesis. Hence 0 < a < tt, and as the characteristic roots of 
A are 1, e''“, it follows that 1 is, in fact, a simple root. 

8.4.2. We next consider the geometrical aspect of the theory of 
orthogonal matrices. We shall choose a system 8 of rectangular 
coordinate axes, to be kept fixed throughout the discussion, and we 




238 OBTHOGONAL AND UNITAKY MATBICES VIII, §8.4 

shall refer all measurements to this system unless the contrary is 
stated. If <5 is anothor system of rectangular coordinates with the 
same origin O as /S, we can associate with it the unique orthogonal 
matrix whose columns are the vectors representing (with respect 
to S) the points on the positive axes of S and at a unit distance from 
0. This matrix will be called the matrix of 8. In this way a bi¬ 
unique correspondence is established between orthogonal matrices 
and systems of rectangular coordinates with O as origin. The 
matrix of the fundamental system 8 has, of comse, e^, eg, 63 as 
its columns and is therefore the unit matrix. 

Theobem 8.4.4. If P is the matrix of the coordinate system 8, 
and if a point X is represented by the vectors x, x vnth respect to S, 8 
respectively, then ^ 

WriteP = (p„). Then the unit vectors 63,53 along the positive 

axes of 8 are given by 

Cl = Pn ei+i^zi e 2 + 5 ? 3 i ©3 ' 

~ 1^12®l“t'1^22®2”^'2^32®3 ’• (8.4.1) 

®3 ~ J^13®l“l“P23®2"i“P33®3 . 

Since P is orthogonal, it follows easily thatf 

®1 ~ 2^H®l^“^^12®2'^"^^13®3 
®2 = ^21^1+^22^2+1^23^3 
®3 ~ i^31®l+P32 ^2+1^33^3 

Writing x = (a;, y, z)^ and x — {x, y, z)^, we therefore have 
X = jBCi-fyea-f ze 3 

= MPu Si+i’i2 h+Piz ^z)+y{P 2 i h+Pzz ^z+Pzz 83)+ 

+2(2’31®1+P32 52+P33 53) 

= (Pii *+J>2i y+Pzx 2)5 i+( 3 )i 2 ^+Pzzy+Pzz 2)82+ 

+ (PlZ «+P23 y+PzZ «)e3- 

*=Pll»+P2iy+P31*' 
y = PuX+p^^y-^p^^z 
«=1>13*+P232/+P83*. 
t Cf. Exercise 8.1.1 (p. 222). 


Thus 



Vin, §8.4 


ROTATIONS IN SPACE 


239 


Q /Pll P 21 

==\Pl 2 P 22 i’32)(2/) = P^X. 

W 1>23 pj\<if 

as asserted. 


It may be noted in passing that the relations between x, y, z and 
Xy y, z (i.e. between the coordinates of the same point with respect 
to the two systems /S, S) may be exhibited in a convenient form by 
means of the following table: 



X 

y 

Z 

X 

Pll 

P 2 I 

P31 

y 

P 12 

P 22 

PZ2 

z 

PlZ 

P2Z 

Pzz 


The rows of this array give the direction cosines of the axes of S 
with respect to the axes of Sy while the columns give the direction 
cosines of the axes of S with respect to those of S, For, by (8.4.1), 

(^r> ^ J^sr (^> S = 1,2, 3), 

and (e,., e^) is clearly the cosine of the angle between the rth axis of 
jS and the 5th axis of jS. 

In discussing spatial relations we cannot avoid referring, in 
one form or another, to right-handedness and left-handedness of 
coordinate systems. This intuitive notion (like the notion of a 
counterclockwise sense) is unambiguous but does not admit of a 
precise verbal definition. We are able, however, to express mathe¬ 
matically the distinction between right-handed and left-handed 
systems. 

Definition 8.4.3. The coordinate system S will be called proper 
or improper according as its matrix is proper or improper orthogonal. 

The property of being proper or improper is not an intrinsic 
feature of S but one that characterizes the relation of 8 to the 
arbitrarily chosen fundamental system 8. In fact, if 8 is right- 
handed, then 8 is right-handed or left-handed according as it is 
proper or improper, while if 8 is left-handed, then 8 is right-handed 
or left-handed according as it is improper or proper. We can justify 
these assertions by a continuity argument. Thus, if 8 and 8 are 
both right-handed, then 8 can be obtained from S by a continuous 
rigid motion of the coordinate axes, in which the determinant of the 



240 OBTHOGONAL AND UNITARY MATRICES VIII, §8.4 

matrix associated with the coordinate system cannot change its 
value discontinuously. Since its value is 1 for 8, it is also 1 for /S; 
and S is therefore proper. 



8.4.3. We now turn to the mathematical 
analysis of rotation. By ‘rotation’ we shall 
understand a rotation of space about a 
directed line through the origin. This line 
will be called the axis of rotation. A rotation 
through an angle a about the positive a:-axis 
clearly carries any point x = (x, y, zY into 
the point x' = where 


x' X 

y* — ycoBa—ZBilioc 
z' = ysin(X+25COSa 

or, in matrix notation, 

x' = Rq^x. 


(8.4.2) 

(8.4.3) 


In these formulae all components are assumed to be measured 
with respect to a right-handed system of coordinates. 

Consider now a rotation, through a, about any directed line 1. 
This rotation can be represented by the equation (8.4.3) with respect 
to a right-handed system of coordinates having I as the positive 
a:-axis. Accordingly, a convenient mathematical definition of 
rotation can be given as follows. 


Definition 8.4.4. A rotation through an angle ol about a directed 
line I {through the origin) is the transformation represented by (8.4.3) 
with respect to a proper coordinate system S having I as its positive 
x-axis. 


For the sake of brevity we shall contract the phrase ‘rotation 
through an angle ol' into ‘an a-rotation’. 

If 8 is right-handed, then 8 is also right-handed and the defini¬ 
tion just given agrees with our intuitive notion of rotation. In 
interpreting geometrically the theorems deduced below, we shall 
accordingly take a right-handed system of axes as the fundamental 
system 8, 

Theorem 8.4.6. Lei I be a directed line through the origin and let 
V be the same line with its sense reversed. Then an oL-rotation about I 
is equivalent to a {2Tr—oL)-rotation about V. 



KOTATIONS IN SPACE 


241 


VIII, § 8.4 

An a-rotation about I is represented by the equations (8.4.2) with 
respect to a proper coordinate system in which the positive aj-axis 
lies along 1. We now take a new system of coordinates by reversing 
the directions of the a;-axis and the y-axis. This new system is 
again proper, and with respect to it the rotation in question assumes 
the form ^ 

y* = 2 /cosa+ 2 :sina, 

= —i/sina+2JCOSa, 
i.e. x' = R_„x = R 2 „_„x. 

This transformation is, by definition, a (277—a)-rotation about V, 

The next step is to obtain the matrix representation of a pre¬ 
scribed rotation. 

Theorem 8.4.6. Let P — (A, /x, v) be a point at unit distance from 

the origin. Then the equation representing the oc-rotation of space 
—>■ 

about the line OP is given by 

x' = TR^T-% 

where T is any proper orthogowil matrix with (A,/x,v)^ as its first 
column. 

Let T be a proper orthogonal matrix with (A, /x, as its first 
column, and let 8 be the coordinate system whose matrix is T. 

Then 8 is proper and OP is its positive a:-axis. If X is any point 
and if the vectors representing it with respect to S, 8 are x, x 
respectively, then, by Theorem 8.4.4, 

X = T-^x. 

The a-rotation about OP carries X into a point X', say, and the 
vectors x, x' representing these points with respect to 8 are, in view 
of (8.4.3), connected by the equation 

x' = RqjX. 

Again, if x' is the vector representing X* with respect to Sy then 

x' = T-ix'. 

Combining these results, we obtain 

X' = TS' = TRe^ic == TR«T-ix, 

as asserted. 


6682 


B 



242 ORTHOGONAL AND UNITABY MATRICES VIII, § 8.4 

It is sometimes useful to break up the formula of Theorem 8.4.6 
into expressions for the separate components of x'. Writing 

/A A' A"\ 

T = L / ,x" 

\v v' v' / 

and using the orthogonality relations for T as well as the equations 
of Exercise 8.1.4 (p. 223) we are led to the identity 

/C+X^l-C) Xfji(l-C)-V8 vX(l~c)+fJiS\ 

= |AjLL(l—c-t-jLt^(l— c) iJLv{l —c)—A^j, 

\vA(l—c )—fxs /Av(l— c)-1 -A5 c+v^( 1—c) / 

where c = coscx, s = sin a. Writing x' = {x\ y\ 2 ;')^, x = (Xy y, z)'^ 
in Theorem 8.4.6 we now obtain 

x' = xciO8 0L-{-(fiz—vy)Qma+X{l—cos(x)(Xx-\-iJiy-\-vz)y 
y' = j/cosa+(va:—A 2 ;)sina+/i(l-“COSod)(Aa:+jLti/+v 2 ;), 
z' = 2;cosa:+(A2/—/xa:)sina4-i'(l—cosa)(Ar+jLt2/+i/2:). 

These formulae are known as Euler's equations of transformation,'^ 

Exercise 8.4.1. Obtain the equations (8.4.2) as a special case of Euler’s 
equations of transformation. 

Exercise 8.4.2. Show that in Euler’s equations of transformation it is 
permissible to interchange the accented and the unaccented coordinates 
provided that, at the same time, a is replaced by —a. 

Theorem 8.4.6 shows how a matrix representation for a given 
rotation can be determined. The next result deals with the converse 
problem of interpreting geometrically a given orthogonal trans¬ 
formation. 

Theorem 8.4.7. Let A he a proper orthogonal matrix other than I, 
and let a be its angle. Then the equation x' = Ax represents an 
oL-rotation about a suitably directed line specified by a principal 
vector of A. 

Let 5 be a principal vector of A and write 5 = (A, /x, Then ? 

—^ 

specifies the directed line I = OP, where P == (A, /x, v). We denote 
by V the same line with its sense reversed. Since, by Theorem 8.4.3, 
the principal vector of A (: 7 ^ I) is determined to within a scalar 
multiple dbl> it follows that the line specified by the principal 
vector of A is unique except for its sense. 

t For a simple geometrical derivation see Sommerville, Analytical Geometry of 
Three Dimensions, 38-39. 



ROTATIONS IN SPACE 


243 


VIII, § 8.4 

Let T be any proper orthogonal matrix having 5 ^ its first 
column. Then, by Theorem 8.4.2, 

A = TR^T-i, 

where jS = a or 27 t— a; and the equation x' = Ax can therefore be 
written in the form 

T-ix' = R^.T-ix. (8.4.4) 

If S is the (proper) coordinate system associated with the matrix 

T, then (by Theorem 8.4.4) the transformation (8.4.4) assumes, with 

respect to S, the form ^ 

x = K^x; 

and this is a jS-rotation about the positive a:-axis of S, i.e. about 1. 
Thus, by virtue of Theorem 8.4.5, the transformation in question is 
an a-rotation about I or about l\ 


Theorem 8.4.8. The product of two rotations is again a rotation. 

Suppose that the first (second) rotation carries the point repre¬ 
sented by X into that represented by x' (x"). Then it follows by 
Theorem 8.4.6 that there exist proper orthogonal matrices A, B 
such that X' = AX, x'' = Bx. 


Hence the product of the two rotations is a transformation of space 
which carries the point represented by x into that represented by 
BAx. Now BA is again a proper orthogonal matrix and the 
transformation in question is, by Theorem 8.4.7, once more a 
rotation. 

Theorem 8.4.8 will presently be superseded by the more general 
Theorem 8.4.12. 


8.4.4. We have so far used orthogonal matrices to represent 
rotations. A different mode of representation in terms of (real) 
skew-symmetric matrices is to be considered next. 

Let i? be a rotation through an angle a (0 ^ a < 27t) about the 

line OP, where P =(A, fiy r) is a point at unit distance from the 
origin. If a ^ tt, we associate with R the skew-symmetric matrix 

( 0 vtan^a —/xtan|(x\ 

—i/tan|a 0 Atan^a 1 . (8.4.5) 

/itan^a —Atan^a 0 / 

Conversely, any skew-symmetric matrix S can be written in the 
form (8.4.5), where A^-f= 1 and 0 < ot < 27r. This expres¬ 
sion for S is not unique, since a, A,/x, v can be replaced by 27r~-a, ~A, 



244 ORTHOGONAL AND UNITARY MATRICES VIH, § 8.4 

—/A, — 1 / respectively, but these two sets of parameters correspond 
to the same rotation.! We have thus set up a biunique corre¬ 
spondence between the set of all rotations (other that 7r-rotations) 
and the set of all skew-symmetric matrices. The matrix S asso¬ 
ciated with the rotation R will be called the skew-symmetric matrix 
of R. Similarly, the (proper) orthogonal matrix A which represents 
R in the sense that R is specified by the equation x' = Ax, will be 
called the orthogonal matrix of R, The advantage of associating 
with R a skew-symmetric rather than an orthogonal matrix is that 
from the former the geometrical character of R can be inferred 
almost immediately. 


Exercise 8.4.3. Show that the skew-symmetric matrix 
/ 0 -V2 -1/V2\ 

I V2 0 1/V2 ) 

\1/V2 -~1/V2 0 / 

represents the rotation through 120° about the directed line joining the 
origin to the point (1, 1, —2). 


Theorem 8.4.9. Let Rbe a rotation other than a rr-rotation^ and 
let A and S be its orthogonal matrix and its skew^symmetric matrix 
respectively. Then 

A = (I-S)(H-S)-i. S = (I-A)(H-A)-^. 

It is, of course, sufficient to establish either of these equations. 
We may note at once that, since S is a real skew-symmetric 
matrix, I-fS is non-singular and, by Theorem 8.1.10 (p. 227) 
(I-S)(I+S)-i is orthogonal. 

Let the given rotation R be specified by the numbers a. A, /x, v 
defined above. Then, by Theorem 8.4.6, A = TR^T-^, where T 
is any proper orthogonal matrix with (A, fx, v)^ as its first column, 

/A A' An 

T==(;i fi' 

\v V v'j 

Now OL and therefore —1 is not a characteristic root of A. 
Hence I+A is non-singular, and 

(I~A)(I-fA)-i = (I-TR,T-i)(I.f-TR«T-i)-i 
= T(I~RJT-“HT(I+R«)T-^]-i 
= T(I-RJ(I-f RJ-iT-i. 


•f Of. Theorem 8.4.6, 



246 


Vni, $ 8.4 ROTATIONS IN SPACE 

Now it is easy to verify that 

/O 0 0\ 

(I-RJ(I+RJ-* = tanW 0 0 1. 

\0 -1 0 / 

Hence, using Exercise 8.1.4 (p. 223), we obtain 

/A A' A'^/O 0 0\/A fi v\ 

(I-A){I+A)-i= tanH/x 0 0 1 A' /x' v'| 

v' v'/lo -1 O/V f." v"! 

I 0 V 

= tan^aj —V 0 A I = S. 

\f, ~A 0 / 

The theorem is therefore proved. 

Theorem 8.4.9 is concerned with the representation of a single 
rotation, but the skew-symmetric matrix of the product of two 
rotations is also easy to evaluate. 

Theobem 8.4.10. Let R be the product of the two rotations R^, R29 
carried out in that order, and suppose that none of the three rotations is 
a TT-rotation, If the skew-symmetric matrices of R, R^, R^ are S, S^, Sg 
respectively, then 

S = (I+S,)(I+S,S,)-HS,+S3)(I+SJ-i. 

We know, by Theorem 8.4.9, that the orthogonal matrices of 
J ?2 are 

(I-Si)(l+Si)-1, (I--S2)(I+S2)-^ = (I+S2)-HI-S2). 

It follows that the orthogonal matrix A of i? is given by 
A = (I+S2)-i(I-S2)(I~Si)(I+Si)-^ 

Hence 

I+A = (I+S,)-H(I+S2)(I+Si)+(I-S2)(I-Si)}{I+Si)-i 

= 2(I+S2)-HI+S2Si)(H-Si)-i, 
and similarly 

I-A = 2(I+Si.)-i(Si+S2)(I+Si)-i. 

But, by Theorem 8.4.9, S = (I+A)“^(I—A), and so 

s = i(I+Si)(H-S,Si)-i(H-S 2 ). 2 (I+S 2 )-HSi+S 2 )(I+SJ-i 
= (i+s,)(i+s, s,)-HS,+s,)ii+s,)-\ 



246 ORTHOGONAL AND UNITARY MATRICES VIII, § 8.4 

8.4.5. The last topic to be considered in the present discussion 
is that of rigid motion. We restrict ourselves to motions of space for 
which one point remains fixed, and we naturally choose this point 
as the origin of coordinate systems. 

Definition 8.4.6. Let Shea system of coordinates with the same 
origin 0 as the fundamental system S. A transformation of space 
which carries every point X into a point X\ whose coordinates with 
respect to S are the same as those of X with respect to S, is called a 
(rigid) motion. The motion will be said to be proper or improper 
according as S is proper or improper. 

To visualize a proper motion we can imagine the axes of 8 forming 
a stiff framework and being moved, without deformation, about the 
origin until they come into coincidence with the corresponding 
axes of S. We then obtain a proper motion if the entire space is 
made to move together with the coordinate axes as one rigid body. 

Theorem 8.4.11. (i) If P is the matrix of the coordinate system S, 
then the motion specified by Definition 8.4.5 can be represented, with 
respect to the system 8, by the equation 

x' = Px. (8.4.6) 

(ii) Conversely, any transformation of type (8.4.6), where P is ortho¬ 
gonal, is a proper or improper motion according as P is proper or 
improper. 

Consider a point X, represented by the vector x with respect to 8, 
If the motion in question transforms X into X', then (by definition) 
X' is represented by x with respect to 8 and so (in view of Theorem 
8.4.4, p. 238) by Px with respect to 8, Thus x' = Px, where x' is 
the vector representing X' with respect to 8, 

Again, given the transformation (8.4.6), let us denote by 8 the 
system of coordinates associated with the matrix P. Then, by (i), 
(8.4.6) represents the motion specified in Definition 8.4.5. This 
motion is proper or improper according as 8 is proper or improper, 
i.e. according as |P| = 1 or |P| == —1. 

We are now able to prove an important result due substantially 
to Euler (1776). 

Theorem 8.4.12. (Euler’s theorem on rigid motion) 

Every proper motion is a rotation, and conversely. 



VIII, §8.4 ROTATIONS IN SPACE 247 

By virtue of Theorem 8.4.11 (i), every proper motion can be 
represented by an equation x' = Ax, where A is a proper ortho¬ 
gonal matrix. Hence, by Theorem 8.4.7, every proper motion is 
a rotation. Conversely, we know by Theorem 8.4.6 that every 
rotation can be represented by an equation x' = Ax, where A is 
a proper orthogonal matrix. Hence, by Theorem 8.4.11 (ii), every 
rotation is a proper motion. 

In geometrical terms Theorem 8.4.12 means that if a system of 
coordinate axes is moved in any way about the origin, subject only 
to the requirement that it is treated as a rigid body, then its motion 
is equivalent to a single rotation about a suitable axis. This result 
is of considerable importance in the dynamics of rigid bodies.f 

The case of transformations involving improper orthogonal 
matrices is now easily disposed of. 

Theorem 8.4.13. If A is an improper orthogonal matrix, then 
the equation x' = Ax represents the product of a rotation and a 
reflection in the origin. Conversely, the product of a rotation and a 
reflection in the origin can be represented by an equation x' == Ax, 
v)here A is an imjjroper orthogonal matrix. 

The transformation x' = Ax is the product, in either order, of 
the transformations 

x' = (-A)x, x' -= -X. (8.4.7) 

If A is an improper orthogonal matrix, then the first of these is a 
rotation, while the second is evidently a reflection in the origin. 

Again, consider a rotation and a reflection in the origin. These 
transformations can be represented by equations (8.4.7), where 
A is an improper orthogonal matrix. Hence their product, in either 
order, is the transformation x' = Ax. 

Exercise 8.4.4. Arguing as in the proof of Theorem 8.3.4 (p. 235), show 
that the equation x' = Ax, where A is an improper orthogonal matrix, 
represents the product of a rotation and a reflection in a plane through the 
origin. 

Exercise 8.4.5. Show that every improper motion is the product of a 
rotation and a reflection in the origin. 


t For a geometrical proof see Lamb, Higher Mechanics, 2-3. 



(248) 

PROBLEMS ON CHAPTER VIII 

1. Verify that , V3(l-i) \ 

/ 2V2 2V2 \ 

I V2+1 -l-f-W2 I 

\ 2 2V3 / 

is a unitary matrix, and evaluate its determinant. 

2. Let be complex numbers of modulus 1. Show that, if the 

rows of a unitary matrix are multiplied by respectively, then the 

resulting matrix is again unitary. 

3. The rows of a matrix A form an orthogonal set. Show by an example 
that the columns of A need not form an orthogonal sot. 

4. Show that, if an orthogonal matrix is triangular, then it is diagonal; 
and that all its diagonal elements are equal to d: 1* 

6. Let U be a complex matrix, and write U = P-f^O? where P, O are 
real. Show that U is unitary if and only if P^Q is symmetric and 
Prp + QTQ ^ I. 

6. Show that any two of the following three statements relating to a 
matrix A imply the third: (i) A is hermitian; (ii) A is unitary; (iii) A^ = I. 

7. Show that Theorem 8.1.11 (p. 228) is no longer valid if the words ‘ real* 
and ‘orthogonal* are replaced by ‘complex’ and ‘unitary’ respectively. 

8. Let a be an angle such that cos ^ 0, and lot 

/I 0 0 V 

A =10 cosa —sinal. 

\0 sin oc cos a / 

Verify that I+A is non-singular, and show that 

/O 0 0\ 

(I-A)(I+A)-i = tania 0 q 1 . 

\0 -1 0/ 

9. Let A be a proper orthogonal matrix of order 3. Show that there exists 
a number t such that — 1 < ^ < 3 and 

A^-tA^+tA-l = O. 

10. Let {r,s = l,...,n). Show that the matrix (a^g) 

is unitary. 

11. A matrix A satisfies the equation A^A = — A. Show that the value 
of each characteristic root of A is 0 or — 1. 

12. U is a unitary matrix such that |U—11 ^ 0, and H is defined by the 

equation = (U-f I)(U—I)”^. Prove that H is a hermitian matrix. If 
the characteristic roots of U are find the characteristic roots of H. 

13. If S is real symmetric and T real skew-symmetric, show that 

|I^T~.iSl ^ 0 

and that the matrix 

U = (I+T-f ^S)(I-T-iS)-i 



Vin PROBLEMS ON CHAPTER VIII 

is unitary. Find the char£tcteristic roots of U, when 


249 


-c :)• -=(: '9- 


14. Let A be a real symmetric and S a real skew-symmetric matrix, and 
suppose that AS = SA, |A—S| ^ 0. Prove that (A-f S)(A—S)“^ is 
orthogonal. 

Construct an orthogonal matrix by taking A = I 3 and 

0 1 0 \ 


S = 


-1 

0 


15. Lot A and B be commuting orthogonal matrices such that A-fl and 
B 4 -1 are non-singular. Provo that 

(AB -1)(AB-f A 4-B 4-1 )-^ 

is a skew-symmetric matrix. 

16. Let A and B be orthogonal matrices and suppose that |A| = — |B|. 
Show that A + B is singular. 

17. Let be an orthogonal matrix^and suppose that x 0. Show 

that —1 < a < 1 , and prove that A 4 - 6 xy^ is orthogonal if and only if 
b — (l—a)“^ or 6 = —( 14 -<^)“^* 

18. Writing A^(A) = {tr(A^A)}l, show that, for every matrix A and every 
unitary matrix U, 

N(UA) = A^(AU) =: iV(A), iV(A-U) =- A^(I~U-iA). 

19. Let U be a unitary matrix, and A = dg(ai,..Show that any 
characteristic root w of UA satisfies the inequalities 

m < |ai| < Af, 

where M — max |ay| and m = min |ay|. 

l<f<n 

20. Let S, T bo proper orthogonal 3x3 matrices with the same first 
column. Show that, for any angle a, SR„ S~^ = TR, T~^; and interpret this 
result geometrically. 

21. A rotation through an angle a is said to be infinitesimal if is 
negligible. Use Euler’s equations of transformation to show that any two 
infinitesimal rotations commute with each other. 

22. Lot P — (A,/x, v) be a point at unit distance from the origin; lot A be 

— 

the orthogonal matrix of the a-rotation about OP; and put 

/ 0 V --yX 

T - 0 A . 

\ fx —A 0 / 

Show that 

A = I—sina.T 4 -(l—cosa)T^ J(A-~A^) = —sina.T. 

23. If P is the point ( 1 , 1 , 1 ), find the orthogonal matrix which represents 
the rotation of space through 46° about the lino OP, 



260 ORTHOGONAL AND UNITARY MATRICES VIII 


24. The orthogonal matrix of a rotation is given by 
/1/V2 1/V2 0 \ 

I 1/2 -1/2 1/V2 . 

\l/2 -1/2 -1/V2/ 


Find the angle and the axis of rotation. 

25. Let A be the proper orthogonal 3x3 matrix corresponding to the 
^-rotation of space about a straight lino through the origin. Show that 

cos lO = iV(lH-trA). 

26. Let a fixed coordinate system bo given. The entire space is made 
to carry out, in turn, an a-rotation about the a;-axis, a j8-rotation about the 
y-axis, and a y-rotation about the s-axis. Show that the angle $ of the resul¬ 
tant rotation is given by 

2cos0 = cosjScosy4-cosycos(x-f-co8acosj8+sinasinj8siny—1. 

27. The entire space is made to carry out an a-rotation about the a;-axis 
and then a j8-rotation about the iz-axis of a fixed coordinate system. Show 
that, if 0 < (x,p < 77, then the resultant transformation of space is a rotation 
through the angle 2 cos~^(cos Ja cos about a line whoso direction ratios 
are tan Ja, tan — tan Ja tan 

28. Show that, if S is the skew-symmetric matrix of a ^-rotation and 
0 < 6 < TT, then 2tan2 |0 — — tr(S^). 

29. Let Si, Sj be the skew-symmetric matrices of two rotations carried 
out successively, and suppose that I + SgSi is non-singular. Show that 
G = tan^ where 6 is the angle of the resultant rotation and a is the sum 
of 2-rowed principal minors of (Sj-f SaKI + SgSi)*"^. 

30. Lot the directed linos ^i, I 2 through the origin make an angle ^ with 
each other, and suppose that an ai-rotation of space about li is followed by 
an ttg-rotation about Zg. Show that the angle 9 of the resultant rotation is 

given by Jag—cos </> sin Jai sin Ja2. 


31. Let o'( A) denote the sum of the squares of the elements of A. Show that, 
for every orthogonal matrix P, a(P^AP) = a(A). 

32. Let a(A) be defined as in the preceding question. If the matrix S is 


such that, for all A, 


o-(S-iAS) = (7(A), 


show that S is a scalar multiple of an orthogonal matrix. 

33. Show that every unitary 2x2 matrix can be expressed in the form 


4 4 Bx(^osd —sin 

dg(e-. cS 




where a, j8, y, 6 are real numbers. 

34. Use Theorem 2.6.6 (Schmidt’s orthogonalization process) to show that 
every non-singular matrix A can be expressed in the form A = Uj Aj, where 
Uj is unitary and Aj triangular, and also in the form A — Ag Ug, where 
Ug is unitary and Ag triangular. 

35. Show that, if 

XJ-AX = A. |I+X| #0, Y = (I-X)(I+X)-i. 
then AY+Y^A = O. 



PROBLEMS ON CHAPTER VIII 


251 


VIII 

Find the most general real matrix X satisfying the relations X^AX = A, 
|I-|-X| 9 ^: 0, in each of the following cases: (i) A = dg(a, 6 ) (a 9 ^ 0, 6 7 ^= 0); 
(ii) A == I 2 ; (iii) A = dg(l,-l); (iv) A == I3; (v) A = dg(l,l,~l). 

36. Show, by using Laplace’s expansion theorem (Theorem 1.4.4) and 
Jacobi’s theorem (Theorem 1.6.3), that the sum of the squares of all r-rowed 
minors formed from r given rows of an orthogonal matrix is equal to 1 . 

37 . Show that all characteristic roots of any square submatrix of a unitary 
matrix are, in modulus, less than or equal to 1. Also show that any minor 
of a unitary matrix is, in modulus, less than or equal to 1 . 



IX 


GROUPS 

In the course of our previous discussion we have repeatedly met 
classes of matrices—such as the class of non-singular diagonal 
matrices or that of orthogonal matrices—which have the property 
that if two matrices belong to the class in question, then so do their 
inverses and their product. Since sets of objects having a similar 
structure occur frequently both in algebra and in other branches 
of mathematics, it is important to isolate and to study the common 
properties of all such sets. We are, in this way, led to introduce the 
concept of a group; and the object of the present chapter is to 
explain and illustrate this concept. We do not intend here to prove 
any general theorems on groups, but merely to exhibit the basic 
notions so as to make the language of the theory of groups available 
in the subsequent discussion of matrices and transformations, j* 

9.1. The axioms of group theory 

9.1.1. Let S be a set of elements, and suppose that some rule 
is given whereby with each ordered pair of elements a, 6 in S we 
associate a new object, which we denote by ab and which may or 
may not be an element of (3. Such a rule is known as a rule of 
composition, and since we use the product notation for denoting the 
object constructed from a and 6, we also call it multiplication. It is, 
of course, important to bear in mind that ab is, in general, distinct 
from ba. 

Definition 9.1.1. Let O be a set of elements a, b, c,..., and let 
R be a rule of composition {called multiplication), defined for all 
ordered pairs of elements of O. Then O is a gboup {with respect to B) 
if the following four axioms—the group axioms —are satisfied. 

(i) If a, b are any two {distinct or equal) elements of O, then ab 
is also an element of O. 

(ii) For all a,b,c e 0, 

{ab)c = a(6c). 

t For a systematic exposition of the theory of groups see Ledermann, Jntrodttc- 
tion to the Theory of Finite Oroupa^ or, if a more advanced treatment is required, 
van der Waerden, Modem Algebra, vol. i, chaps, ii and vi. 



THE AXIOMS OF GROUP THEORY 


253 


IX, § 9.1 

(iii) O contains an element e, called a unit element, swh that, 
all aeO, 

ae = ea = a. 

(iv) To every element aeO there corresponds an element a-^, 
called an inverse element of a, such that 

aa-^ = a~^a = c. 

Axiom (i) states simply that G is closed with respect to multiplica¬ 
tion, and axiom (ii) that multiplication is associative. Axiom (iii) 
asserts the existence of an element having certain special properties, 
and axiom (iv) the existence of a special element associated with 
each element of the group. 

A group, then, is a set G of elements with a structure imposed 
on it by the rule of composition. This rule is often referred to as the 
group operation. One of the simplest instances of a group is provided 
by the set of all real numbers other than zero; this set is a group with 
ordinary multiplication as its group operation. 

Exercise 9.1.1. Verify this statement. 

It is easy to see that the elements whose existence is postulated 
by axioms (iii) and (iv) are, in fact, unique. 

Theorem 9.1.1. Each group contains precisely one unit element, 
and each element of a group possesses precisely one inverse element. 

The validity of this theorem will enable us to speak of the unit 
element and of the inverse of any given element. 

Let both e and c' be unit elements of a group G, so that, for all 

® ^ ae = ea = a, ae' = e'a = a. 

This implies, in particular, that 

c = e'e = e', 

and the uniqueness of the unit element is therefore established. 
Next, let both x and y be inverses of an element a. Then 

ax = e, ya = e. 

Premultiplying the first equation by y, we obtain 

y{ax) = ye. 

Hence, by axioms (ii) and (iii), 

(ya)x = y, 

X = ex = {ya)x = y. 


and so 



264 GROUPS IX, § 9.1 

The inverse of a is therefore unique, and the symbol is thus 
unambiguously defined. 

Exercise 9.1.2. Show that the unit element is its own inverse. 

Exercise 9.1.3. Let a and b be any elements of a group G, Show that 
the equation ax = h possesses the unique solution x = and obtain the 
corresponding result for the equation xa = 6. 

Definition 9.1.2. A group G is abelian if any two elements 
commute with each other, i.e. if, for all a, b e 0, ab = ba. 

Definition 9.1.3. A group is finite or infinite according as it 
possesses a finite or infinite number of elements. In the former case 
the number of elements is called the order of the group. 

Definition 9.1.4. Let O be a group and H a subset of O, If H is 
itself a group with respect to the same rule of composition as O, then 
H is said to be a subgroup of G. 

Trivial examples of subgroups of G are G itself and the group 
consisting solely of the unit element of G, 

Exercise 9.1.4. Let G be a group, and let Z be the set of all those elements 
of G which commute with every element of G, Show that Z —^which is 
known as the centre of G —is an abelian subgroup of G, 

9.1.2. Mathematics affords many instances of concrete systems 
which possess the group structure, and in order to give some idea 
of the importance of the theory of groups we shall now mention a 
few examples. 

The most obvious instances of group structure are provided by 
groups whose elements are numbers. Thus the rational numbers, 
the real numbers, and the complex numbers, with 0 excluded in 
each case, form infinite abelian groups with respect to ordinary 
multiplication. In each case the unit element is 1, and the inverse 
of a is 1/a = a-^. It is precisely this fact which gives rise to the 
notation and terminology in axioms (iii) and (iv) of Definition 9.1.1. 
It may be noted in passing that the first of the three groups men¬ 
tioned above is a subgroup of the second, and the second is a sub¬ 
group of the third. Again, the integers, the rational numbers, the 
real numbers, and the complex numbers all form infinite abelian 
groups with respect to addition as group operation.f The unit 
element is now 0, and the inverse of a is —a. Each group is again 
a subgroup of all those mentioned later in the list. All groups 

t Thus the term * multiplication’ in the sense of the group operation is here 
interpreted as ordinary addition. 



THE AXIOMS OF GROUP THEORY 


255 


IX, § 9.1 


enumerated so far have been infinite. An instance of a finite group 
ofnumbers is furnished by the mth roots of unity = 0,1,..., 

m—1). These numbers form an abelian group of order m with 
respect to ordinary multiplication. The unit element here is 
1 = the inverse of is particular, the 

numbers 1,-1 form a multiplicative group, as do the numbers 
1, i, —1, —i. 

Again, let m be a positive integer and denote by the residue 
class consisting of all integers congruent to x (modm). Addition 
and multiplication of residue classes can be specified by the 


formulae 


C^ + Cy 


which can easily be shown to lead to unambiguous definitions. The 
m residue classes (modm) then form an abelian group with respect 
to addition. In this group Cq is the unit element, and is the 
inverse of However, we obtain a much more interesting group 
by taking multiplication of residue classes as the rule of composition. 
A residue class (mod m) is said to be prime to m if some number 
(and therefore every number) in it is prime to m. It is easy to show 
that the residue classes (modm) prime to m form an abelian group 
with respect to multiplication as defined above. The unit element 
is now Cl, and the inverse of Cj. is the residue class where x' is 
any number such that x'x hee 1 (modm). This group is of great 
importance in the theory of numbers. 

Many groups have functions as their elements. In this case the 
most natural rule of composition for the functions/(x), g(x) is the 
formation of the composite function g[f(x))\ which, in accordance 
with Definition 6.2.1 (p. 172) we call the product of f{x) and g(x) 
and denote by the symbol gf. In view of Theorem 6.2.1, we have 
f{gh) — (/f 7 )A, so that with our definition of products the associative 
law is satisfied. Consider, for instance, the set G of all functions 
/ (x) continuous and strictly increasing in the interval 0 < a; < 1, 
and such that /(O) = 0, /(I) = 1. Then f,g eG implies fg e O, 
Again, the function e(x) = x belongs to G and satisfies the equation 
fe = ef = f for all f e G, Finally, if denotes the (unique) 
functional inverse of / eG, then clearly f~^ e G and = f~^f = c. 
Hence G is a group and it is evidently infinite and non>abelian. 

Some of the most important groups have as their elements 
geometrical transformations of various types. In accordance with 
Definition 6.2.1 we shall continue to refer to the resultant of two 



256 


GROUPS 


IX, §9.1 

transformations as their product. More precisely, if and ^2 are 
transformations, we shall denote by the transformation 
obtained by first applpng and then 

The translations of a plane (or of space) clearly constitute a 
group, but a more interesting group is formed by rotations. If a is 
any real number, let us denote by a! the unique number such that 
0 < a' < 277 and a'—a is an integral multiple of 277 . Furthermore, 
if 0 < a < 277, let us denote by R{ol) the rotation of the plane, in a 
counterclockwise sense, through the angle a about a fixed origin. 
Then, for 0 < a, < 277 , we have 

i2(i3)i?(a) = i?((«+i8)'). 

The rotations of the plane thus form an infinite abelian group; its 
unit element is ii(0) and the inverse of R(ol) is i?((—a)'). More 
generally, it is easy to see that the set of all euclidean collineations 
(i.e. the set of all transformations composed of translations, 
rotations, and reflections) is an infinite non-abelian group. 

Exercise 9.1.6. Show that the rotations of space about a fixed point 
form an infinite non-abelian group. 

9.1.3. By a permutation of degree n is meant the operation of 
changing the order of n given distinct objects, say the numbers 
1 , 2 ,..., 71 . Such a permutation is, in other words, the operation of 
replacing one arrangement (Ai,...,A^) of (l,...,7i) by a second 
arrangement (/xi,...,/x^). We represent this permutation by the 
symbol 

If in this symbol the A’s and ^’s are rearranged in the same way, 
then the new symbol will clearly represent the same permutation; 
that is, if (l,...,7i), then 



For example 

p43 1\ p 2 3 4\ 

\l 3 2 4 ; ^3 2 1 4 ; \4 3 1 2 / 

It is clearly always possible to take 1 ,..., n (or any other prescribed 
arrangement of 1 ,..., ») as the upper row of the symbol representing 
any given permutation of degree n. 




267 


IX, §9.1 THE AXIOMS OF GROUP THEORY 

The identical permutation e is the permutation which leaves 
unaltered every arrangement, i.e. 



The product qp of the two permutations p and q is defined as the 
permutation resulting from first carrying out p and then q. Thus 



The n\ permutations of degree n form a group with respect to 
multiplication, the symmetric group S^. For the identity permuta¬ 
tion acts as the unit element; the inverse of 


is the permutation 



and it is easy to verify that the associative law is satisfied. For 
n > 2 the group 8^ is non-abelian. 


Exercise 9.1.0. 
of tho form 


Prove the last statement by considering permutations 

n, 2. 3. 4, 5,..., n\ 

\A, /X, V, 4, 5,..., nr 


where (A, /x, v) = .s/(l, 2, 3). 


In addition to the symmetric group, which comprises all permuta¬ 
tions (of degree n), there are other groups of permutations, all of 
them naturally subgroups of the symmetric group. Thus, for 
instance, the four permutations of degree 4, 



(9.1.1) 


form an abeUan subgroup of 8^, in which all elements are powers 
of a single element,! since 

b z=: a^, c a^, c = a^. 


t Powers of elements of a group are unambiguously defined in view of the 
associativeness of multiplication. 

6582 


S 



268 


GROUPS 


IX, § 9.1 


Another abelian subgroup of 8^ is formed by the permutations 



( 9 . 1 . 2 ) 


and here we have z= =2 e, be = a, ca b, ab == c. 

We can now give an interesting interpretation, in terms of 
permutations, of the c-symbol of § 1.1. Let (A^,..., A^) and (/Xi,..., 
be arrangements of (1,..., n). Then, in view of Theorem 1.1.2 (p. 3), 


W.-.mJ 

is a function of the permutation 



(9.1.3) 


(9.1.4) 


and we therefore simply write €{p) for the symbol (9.1.3). The 
function €{p) assumes the values +1 and —1; and it satisfies, 
moreover, the equation 


€(qp) = €(q)€(p) 


(9.1.5) 


for all permutations p, q. For let p be given by (9.1.4) and write 


Then 


? == 





e(S'MiJ) = 8gn n (/is-Mr)-sgn IT (>'»—»'r)X 

Xsgn n (Ag—A,).sgn IX (m*—M r) 

l<r<a<n l<r<«<n 


= 8gn n 


l<r<s<n 


(Ag-A,).sgn n 

l<r<8<n 



A particularly simple type of permutation is a transposition^ i.e. 
the interchange of two numbers of an arrangement. Any trans¬ 
position t can clearly be written in the form 



where r < s; and it therefore follows by Theorem 1.1.3 (p. 4) 


that 


e{t) = ^1. 


(9.1.6) 



THE AXIOMS OF GROUP THEORY 


259 


IX, § 9.1 

It is easy to see that every permutation can be expressed (though 
not in a unique way) as a product of transpositions. Let, then, a 
permutation p be written in the form p = ... where 

are transpositions. By (9.1.5) and (9.1.6) we have 


€(p) = €(g...e(g = (~-l)^ (9.1.7) 

This shows that, whenever a given permutation is expressed as a 
product of transpositions, the number of factors in the product is 
either always even or always odd.f It is natural, therefore, to call 
a permutation even or odd according as it is the product of an even 
or an odd number of transpositions. Equation (9.1.7) then shows 
that €(p) = +1 or — 1 according as p is even or odd. 

We now call the arrangement (Ai,...,A„) of (l,...,n) even or odd 
according as the permutation 

( Ai,.**> A,j^\ 

is even or odd. It follows that in the expression 


Z) = 


2 

(Ai »*< 


€(Ai,..., A^) •••^nXn 

ftl) 


for the determinant D = l^ijlny symbol €(Ax,...,A^) has the 
value +1 or —1 according as (A^,..., A^) is an even or odd arrange¬ 
ment. 


9.1.4. We recall that structural identity of two linear manifolds 
was referred to as an isomorphism between them.J A similar 
terminology is used in the theory of groups. 

Definition 9.1.5. Two groups (?, O' are isomorphic (in symbols : 
0 O') if a biunique correspondence a^r^a' (a eO, a' e O') can be 

set up between them in such a way that, whenever a<->a',b<-> 6', we 
have ab<-^a'b'. The correspondence itself is then called an iso¬ 
morphism between O and O', 

In other words, 0 c::i O' iie. biunique correspondence can be set 
up between O and O' such that, for all a,b gO, (ab)' = a'fe', where 
x' denotes the (unique) element of O' which corresponds tox e O. 
As in the case of linear manifolds we can also express the same idea 
in terms of the functional notation. An isomorphism between O 
and O' is a function ^(a) defined uniquely for all a e G and having 
the following properties: (i) ^(a) is an element of O' for all a e O; 


t Cf. Exercise 1.1.3 (p. 3). 


{ Definition 2.4.1 (p. 58). 



260 GROUPS IX, § 9.1 

(ii) given any element a' 6 0\ there exists precisely one element 
a eO such that ^(a) = a'; (iii) for all a, b e Owe have 

Definition 9.1.5, or either of the two equivalent definitions, 
express in precise language the requirement that the groups 0 and 
O' should possess the same structure. Two isomorphic groups 
will, of course, have in common all properties which are purely 
structural in character and are independent of the nature of the 
elements. For example, if one of them possesses exactly 3 sub¬ 
groups of order 4, then so does the other. 

Exercise 9.1.7. Show that isomorpliism between groups is an equiva¬ 
lence relation. 

Exercise 9.1.8. Let (7, O' be isomorphic groups with unit elements e, e' 
respectively, (i) Show that e e'. (ii) Show that, if a a' and a* = e, 
then a'^ = e'. 

A useful device for exhibiting the structure of a finite group is 
the multiplication table, which is a square array displaying the 
product of every pair of elements. Consider, for example, the 
symmetric group 8^, Its elements are 



and its multiplication table is as follows: 



e 

p 

3 

r 

s 

t 

e 

e 

p 

3 

r 

s 

t 

P 

P 

e 

s 

t 

3 

r 

3 

3 

t 

e 

s 

r 

P 

r 

r 

a 

t 

e 

P 

3 

a 

s 

r 

P 

3 

t 

e 


t 

3 

r 

P 

e 

8 


Here the product xy is the element standing in the row correspond¬ 
ing to X and the column corresponding to y, e.g. qt =■. p^tq^ r. 

It is clear that two finite groups are isomorphic if and only if they 
have the same multiplication table, provided, of course, that the 



IX, §9.1 THE AXIOMS OP GROUP THEORY 261 

elements in each group are suitably labelled. Thus the set of the six 
functions 


e{x) — X, p{x) = q{x) = 1—x, r{x) — - 

X X 


X 




s(x) = 


1—x’ 


t(x) 


x—l 


X 


is a group, if multiplication of functions is defined as on p. 255. It 
is, in fact, easy to verify that this group has the same multiplication 
table as and so is isomorphic to 

Again, the group of permutations given by (9.1.2) has the follow¬ 
ing multiplication table: 



e 

a 

b 

c 

e 

e 

a 

b 

c 

a 

a 

e 

c 

b 

b 

b 

c 

e 

a 

c 

c 

b 

a 

e 


Any group whose structure is specified by this table is known as a 
Klein four-group. Further instances of groups of this type are 
provided by the group of functions 

e(x) = Xy a{x) = —Xy b{x) = c{x) = — 

X X 

and by the multiplicative group of residue classes (mod 8) prime 
to 8. On the other hand, the group of permutations (9.1.1) is not a 
Klein four-group. 

Exercise 9.1.9. Verify these statements. 


9.2. Matrix groups and operator groups 

9.2.1. Among the groups that have been studied most exten¬ 
sively are groups of matrices.f Obvious rules of composition for 
such groups are matrix addition and matrix multiplication. Thus 
the set of all mxn matrices is a group with respect to addition, 
with as the unit element and — A as the inverse of A. Again, a 
vector space (or, more generally, a linear manifold) is a group with 
respect to addition. However, in almost all important examples of 
matrix groups, multiplication is the group operation. 

Definition 9.2.1. A matrix group is a set of matrices which 
form a group with respect to matrix multiplication. 


t See, for instance, van der Waerdon, 26. 



262 GROUPS IX, §9.2 

It is thus to be understood that, unless the contrary is stated 
explicitly, the rule of composition is matrix multiplication. 

It is plain, in the first place, that a matrix group must consist of 
square matrices of the same order. Secondly, either all matrices 
of a group are non-singular or else all are singular. For let A be 
an element of a matrix group F and suppose that |A| 0. If E 

denotes the unit element of F, then AE = A; and, premultiplying 
by A-i, we obtain E = I. Now let X be any element of F, and 
denote its inverse element by X'. Then X'X = E = I, and this 
implies that |X| Hence, if one element is non-singular, then 
so is every element; and the assertion follows. 

We must therefore distinguish between groups of non-singular 
and those of singular matrices. All important matrix groups are 
of the former type, and we shall for the present confine ourselves to 
these groups. In § 9.4 we shall turn to groups of singular matrices 
and shall show that, in a sense to be explained, their structure does 
not differ essentially from the structure of groups of non-singular 
matrices. 

Exebcise 9.2.1. Discuss the fallacy in the following argument: ‘Groups 
of singular matrices cannot exist since such matrices do not possess inverses.* 

Exercise 9.2.2. Show that the real matrices of the type , where 

X ^0, form a group which is isomorphic to the multiplicative group of real 
non-zero numbers. 

The discussion a few lines above enables us not only to recognize 
that a matrix group cannot contain both singular and non-singular 
matrices, but also to draw the following conclusion. 

Theorem 9.2.1. In a group of non-singular matrices the unit 
element is the unit matrix, and the group inverse coincides with the 
matrix inverse. 

The second part of the theorem means simply that the inverse 
element X' of X is actually equal to the inverse matrix X“^—a 
conclusion which must not be taken for granted solely on the 
ground that the term ‘inverse’ is used in both matrix theory and 
group theory. 

If we wish to show that any particular set of matrices constitutes 
a group, there is no need to verify every one of the axioms of 
Definition 9.1.1, We have, in fact, the following useful criterion. 



IX, §9.2 MATRIX GROUPS AND OPERATOR GROUPS 263 

Theorem 9.2.2. A set 0 of non-singular matrices {of the same 
order) is a matrix group if and only if, whenever A and B belong to 
Oj so also do AB and A~^. 

In other words, (? is a matrix group if and only if it is closed 
with respect to matrix multiplication and matrix inversion. 

The proof is immediate. The stated condition is obviously neces¬ 
sary. It is also sufficient, for axiom (i) is implied by this condition, 
axiom (ii) holds automatically since matrix multiplication is 
associative, and axioms (iii) and (iv) follow since, if A e 0, then 
I = AA-i G G. 


Of the matrix groups mentioned below we shall recognize several 
as already familiar to us. Possibly the most important of these is 
the full linear group GL(n), which consists of all non-singular nxn 
matrices (over a specified reference field). In view of the discussion 
in § 4.2, it is clear that the full linear group can be used for studying 
linear transformation of vector spaces. When we are dealing with 
projective space, however, a slightly modified group is required. 
This is the projective group PGL{n--l) which is defined as the 
group of all non-singular nxn matrices, with the convention that 
two matrices which only differ by a scalar multiple are to be 
regarded as identical.f 

The following sets of nXn matrices evidently form groups: 
(i) unimodular matrices, i.e. matrices whose determinants are 
equal to ±1; (ii) unitary matrices; (iii) orthogonal matrices; 
(iv) proper orthogonal matrices. These groups are known respec¬ 
tively as the unimodular group, the unitary group, the orthogonal 
group, and the rotation group.% It is clear that the third of these 
groups is a subgroup of the first and also of the second, and the 
fourth is a subgroup of the third. 

Exekcise 9.2.3. Show that the non-singular upper triangular matrices 
form a matrix group, but that the non-singular triangular matrices do not. 

An important matrix group is provided by matrices of the type 

(-6 .)• 

t To make this definition precise we need the notion of a factor group (see 
Ledermann, op. cit., 102). Indeed, PLQ{n—l) is the factor group GL{ti)IS{n), 
where S{n) is the group of non-singular scalar matrices of order n. 

t These and several other important groups have been the subject of very 
detailed study. For an account of the resulting theory see Murnaghan, 29, 
Dieudonne, 30, and Woyl, 31. 



2«4 GROUPS 

where o and 6 are real. We have 


IX, § 9.2 


(a b\ (a' b'\_{ a+a' 6+6'\ 
\-b oj + \-6' a')~\-{b+b') a+aj’ 


and from this it is easily seen that the matrices (9.2.1) form a group 
with respect to addition. The group is isomorphic to the additive 
group of complex numbers, with the correspondence given by the 
scheme 


U 3 


^a+i6. 


(9.2.2) 


Again, we have 

la b\( a' 6'\ ^ / aa'-66' ab'+a'b\ 
al\-b' aj “ (-(ab'+a'b) aa'^bbj 


Moreover, if a and 6 are not both zero, then the matrix (9.2.1) is 
non-singular, and 

/ a 6\-i _ (a/{a^+b^) - 6 /(a 2 + 62 u 
[^b a) ~ \b/(a^+b^) a/{a^+b^) j* 

It follows by Theorem 9.2.2 that the matrices (9.2.1), with a, 6 
not both zero, form a group with respect to matrix multiplication. 
It is easy to verify that this group is isomorphic to the multiplicative 
group of non-zero complex numbers, and that the correspondence 
is once again given by the scheme (9.2.2). 

The analogy, exhibited by the two isomorphisms, between the 
algebra of complex numbers and that of matrices of type (9.2.1), 
makes it possible to discuss the extension of the system of real num¬ 
bers to the system of complex numbers by means of such matrices.! 
Furthermore, the second isomorphism implies that with any set 
of complex numbers forming a group with respect to multiplication 
there is associated an isomorphic group of real 2x2 matrices. 
Thus, for instance, the numbers 1, i, — 1, — i form a multiplicative 
group; and an isomorphic matrix group consists of the matrices 


(i ?). j). (V -M' C -«')• 

The two groups are also isomorphic to the multiplicative group of 
residue classes (mod 5) prime to 5, and also to the group of permuta¬ 
tions listed in (9.1.1) (p. 267). 


t See, for example, Copson, An Introduction to the Theory of Functions of a 
Complex Variable, 2-5. 



IX, §9.2 MATRIX GROUPS AND OPERATOR GROUPS 266 
Next, consider the matrices 

\—W Z I 

where z and w are complex numbers. These matrices are easily 
seen to form a group with respect to addition. Moreover 

Q{z,w)Q{z* ,w*) — Q{zz'—vrw\zw' -\-z'w)^ 
and, when z, w are not both zero, 

{ 0 ( 2 ,k;)}“^ = Q{zl(zz-{-v)w)y —wl(zz-\-ww)). 

From these formulae it follows that the matrices 0(2^, ^), with z, w 
not both zero, form a group with respect to multiplication. The 
significance of these matrices lies in the fact that they constitute a 
system isomorphic (as regards both addition and multiplication) 
to an important system known as the ring of real quaternions,^ 


9.2.2. The majority of groups mentioned earlier in this chapter 
may be regarded as groups of operators. Thus nxn matrices are 
operators in the vector space functions are operators in the 
system of numbers, geometrical transformations are operators in 
geometrical space, and permutations are operators in the set of 
arrangements. We shall now introduce a few further groups of 
operators which arise in the theory of matrices. 

We begin by considering similarity transformations; these may 
be regarded as operators in the set of matrices. If X is a non¬ 
singular matrix, we denote by the similarity transformation 
having X as the transforming matrix, so that, for every square 
matrix A, _ X-^AX. 

We observe that 


(<y<x)(A) = M<x(A)} = MX-^AX) = Y-MX-iAX)Y 

= (XY)-iA(XY) = <xy(A), 

and so <y<x = <xy- (9.2.3) 

In particular, therefore, 

^x^X“^ = (9.2.4) 

and <, is, of course, the identical transformation which carries 
every matrix into itself. It is now clear that similarity transforma¬ 
tions form a group. For axiom (i) is satisfied by virtue of (9.2.3), 
which also implies axiom (ii); axiom (iii) is satisfied since ti has 


t For details see McCoy, Rings and Ideals, 0 and 21. 



GROUPS 


266 


IX, § 9.2 


the properties of a unit element; and axiom (iv) is satisfied since, 
by (9.2.4), tx-i is seen to be the inverse of tx» 


Definition 9.2.2. The group of similarity trarisformatious^ is 
knovm as the similarity group. 


Exebcise 9.2.4. Show that t^ji = <x» where a is any non-zero scalar. 

Exercise 9.2.6. Discuss the fallacy in the following argument. ‘The 
elements of the similarity group can be put into biunique correspondence, 
with those of the full linear group. Hence, by (9.2.3), the relations 
Y“"i imply ; and the two groups are there¬ 

fore isomorphic.* 

Certain subgroups of the similarity group play an important part 
in the discussion of the later chapters. 

Definition 9.2.3. If the matrices A and B are connected hy the 
relation B = U”^AU, where U is unitary {orthogonal), then B is 
said to he obtained from Khy a unitary (orthogonal) similarity 
transformation ; and A and B are said to be unitarily (ortho¬ 
gonally) SIMILAR to each other.% 

Using Theorems 8.1.4 (p. 224) and 8.2.2 (p. 230) the reader can 
readily convince himself that the unitary similarity transforma¬ 
tions and the orthogonal similarity transformations form groups. 

Definition 9.2.4. The group of unitary {orthogonal) similarity 
transformations is called the unitary (orthogonal) similarity 

GROUP. 

An analogous situation arises in the case of congruence trans¬ 
formations. We recall that a matrix A is said to be subjected to a 
congruence transformation if it is transformed into P^AP, where 
P is some non-singular matrix. It is easy to verify that the con¬ 
gruence transformations form a group. If P is restricted to ortho¬ 
gonal matrices, we call the transformation an orthogonal congruence 
transformation; such transformations again form a group. 

Definition 9.2.6. The group of congruence {orthogonal 
cmgruence) transformations is called the congruence (ortho¬ 
gonal congruence) group. 

The reader will have noticed that the terms ‘orthogonal con¬ 
gruence transformation’ and ‘orthogonal congruence group’ are, 

t We speak here, of course, of transformations of matrices of a fixed order. 

X The relation between A and B is, of course, symmetric and is, indeed, an 
equivalence relation. 



267 


IX, §9.2 MATRIX GROUPS AND OPERATOR GROUPS 

in fact, synonymous with the terms ‘orthogonal similarity trans¬ 
formation’ and ‘orthogonal similarity group’ respectively. Never¬ 
theless, it is useful to retain both pairs of terms. 

Another type of transformation that arises in the discussion of 
Part III is that specified by the scheme P^AP, where |P| 0. 

Such a transformation is called conjunctive, and the set of conjunc¬ 
tive transformations is a group. If P is unitary, then the trans¬ 
formation is simply a unitary similarity transformation. 

9.2.3. Groups of operators are important in the study of equi¬ 
valence relations. Thus it is clear that if O is a group of operators 
in a set S of objects, then it satisfies conditions (i)~(iii) of Theorem 
6.5.2 (p. 188). Hence, in view of Theorem 6.5.2 and Definition 6.5.3 
we can speak of equivalence in S with respect to the group O of 
operators. This notion enables us, in particular, to gain a unified 
view of several branches of geometry. Thus, for example, euclidean 
geometry is seen to be the study of-properties of figures from 
the point of view of equivalence with respect to the group of 
euclidean operations—translations,rotations, andrefiections. More 
generally, any group of operators in a space defines a corresponding 
geometry, t 

9.3. Representation of groups by matrices 

9.3.1. In investigating the structural properties of a group G 
we may find that the problems we wish to solve are intractable if 
0 is given in an unmanageable form, say in the form of a multiplica¬ 
tion table. In such circumstances it can be advantageous to 
replace G by some isomorphic group O', chosen in such a way that 
it can be handled easily and efficiently. It is often particularly 
useful to take as G' a group of matrices, for then the highly de¬ 
veloped matrix technique at once becomes available in our in¬ 
vestigation. 

Definition 9.3.1. If O is a group and G' a matrix group iso¬ 
morphic to O, then O' is called a representation (or, more precisely, 
a FAITHFUL representation) of O. 

The branch of algebra which deals with the representation of 
groups by matrix groups is known as the theory of representations. 
It was initiated by Frobenius at about the turn of the century and 
has since been developed more systematically and more intensively 

t For further elaboration of this remark see Semple and Kneebone, Algebraic 
Projective Geometry, chap. i. 



GROUPS 


268 


IX, § 9.3 


than any other part of group theory. Here we can do little more 
than give some examples of matrix representations.! 

It has already been noted that the matrices 

(-6 !)’ 

where a, b are real and not both zero, provide a representation of 
the multiplicative group of all non-zero complex numbers. This 
implies, in particular, that any multiplicative group of complex 
numbers possesses a representation by real matrices; an example 
of such a representation was given on p. 264. We can use the same 
idea to obtain a representation, in terms of real matrices, for the 
group Q of matrices 

where 2 , w are complex and are not both zero. If we replace each 
element in (9.3.2) by the appropriate matrix of type (9.3.1), we 
recognize easily that the 4x4 matrices 


' a 

b 

c 

d ^ 

-b 

a 

-d 

c 

—c 

d 

a 

-b 

[-d 

—c 

b 

a j 


where a, 6, c, d are real and not all zero, represent the group Q. 

Again, the group of plane rotations about the origin is represented 
by the rotation group of 2 x 2 matrices. More precisely, in this 
representation the matrix 

( cos a —sina\ 
sin a cos a/ 

corresponds to the rotation jR(a) through an angle ol. The group of 
space rotations about the origin is represented by the rotation 
group of 3 X 3 matrices. 

The intimate connexion between matrices and linear transforma¬ 
tions of linear manifolds has already been noted in Chapter IV. We 
can now show that certain sets of linear transformations form 
groups which admit of matrix representations. 

Theorem 9.3.1. The set of all auUmorphisms of a linear manifold 
of dimensionality n is a group isomorphic to the full linear group 
OL(n). 

t For a systematic discussion see e.g. Speiser, 27, chaps, xi-xv, and Mumag- 
han, 29. 


^ Z 


(9.3.2) 



IX, §9.3 REPRESENTATION OP GROUPS BY MATRICES 269 

Let 3R be the linear manifold and let Li, be two linear trans¬ 
formations of 9Jl into itself. If ® is any basis of 351, write 

Ai = ; SB)., Aj = S8).t 

For any X e 351, put 

X' = Li(X), X" = (L^ Li)(X) = L,(X'), 

X = ^(X; SB), X' = ^(X'; SB), x" = SB). 

Then x' = A^x, x" = AjX', 

and so x" = AjA^x. 

This means that Lj is a linear transformation of 351 into itself, 
AjAi = SB). (9.3.3) 

Now, if Xi, ig automorphisms of Sffl, then, by Theorem 4.3.3 
(p. 125), Ai and Ag are non-singular. Hence Ag A^ is non-singular 
and so, again by Theorem 4.3.3, is an automorphism of 9K. 

To show that the set of all automorphisms of 9Jl is a group, we 
have to verify the four group axioms. Axiom (i) is satisfied, as we 
have just observed; and axiom (ii) holds by virtue of the associative 
law for multiplication of operators. J The identical automorphism, 
say Lq, which transforms every element into itself acts as the unit 
element, and so axiom (iii) is satisfied. Finally, let L be any auto¬ 
morphism of 9K, and write 

A = ^(L; »). (9.3.4) 

If the automorphism L* is defined by the equation 

A-i = »), 

then, by (9.3.3), 

I = 3«{LL^\ S) = »). 

In other words, LL* = = L^, Thus L* is the inverse of L, 

and axiom (iv) is satisfied. 

The automorphisms of 501 thus form a group H, Consider now 
the biunique correspondence, given by (9.3.4), between the 
elements of H and those of GL(n), In view of (9.3.3) it follows at 
once that H OL(n), 

Exebcisb 9.3.1. Show that any group of automorphisms of a linear 
manifold of dimensionality n is isomorphic to a group of non-singular nxn 
matrices. 


t This notation is explained in Definition 4.2.4, p. 118. 

t Alternatively, axiom (ii) follows from (9.3.3) and the associative law for 
matrix multiplication. 



270 


GROUPS 


IX, § 9.3 


Exbbcise 9.3.2. Let F be a set of n x n matrices and F' the set of linear 
transformations of SBn into itself represented by the matrices of F with 
respect to some fixed basis in S3n* Show that if F is a group, then so is F', 
and conversely; and that F c:l F'. 

9.3.2. We have been concerned, so far, with matrix representa¬ 
tions of infinite groups. The possibility of representing finite 
groups by matrices is, however, equally interesting and we shall 
show that every finite group can, in fact, be so represented. 

Theorem 9.3.2. Every finite group is isomorphic to some 
matrix group, 


Tjet the elements of a group 0 of order n be denoted by ai,..., 
With each element let us associate the nxn matrix M(a,.) defined 
by the equations 




' 1 (ofiay = Or) 
0 (of lo^ # Or) 




(9.3.5) 


In particular, if c denotes the unit element of G (so that e is one of 
the tty), then 

so that M(€) = I. 


Since evidently M(Or) ^ M(o,) for r ^ s, it follows that the 
definition (9.3.5) sets up a biunique correspondence between 0 
and the set O' of matrices M(ar). 

We have, for r, a, i,j — 


{M(ar)M(o,)}y = i {M(Or)}<fc{M(o,)}ft, = l,h 

k 


where the conditions under the sign of summation on the right- 
hand side are = a^, = a^, 1 < fc < n, i.e. 

aj Ug 1 Ic n, (9.3.6) 

If ^ then there exists no value of k satisfying (9.3.6), 

and the sum has the value 0. If, on the other hand, 
then precisely one value of k satisfies (9.3.6), and the sum has the 
value 1. Thus 


{M(ar)M(o,)}<^ = 


f 1 = 

[0 (of^oy 


1 = {M(OrOj}i^, 

J 


and so M(Or)M(a,) = M(OrO,) (r,« == !,...,»). 


(9.3.7) 


t This result should be compared with Cayley’s theorem that every finite group 
is isomorphic to some group of permutations. See Ledermann, op. oit., 78. 



IX, §9.3 REPRESENTATION OF GROUPS BY MATRICES 271 
In particular, this implies 

M(a,.)M(a~^) = M(e) = I, 

i.e. {M(a,.)}-^ = M(a~^) (r = (9.3.8) 

By (9.3.7) and (9.3.8), we see that the set G' is closed with respect 
to multiplication and inversion of matrices. Hence, by Theorem 
9.2.2 (p. 263), G' is a group; and, by (9.3.7), O 0\ 

It may be noted that each matrix M(a,.) has precisely one 1 in 
each row and precisely one 1 in each column, all other elements 
being equal to 0. It is thus plainly orthogonal and we see that 
every group of order n is isomorphic to a subgroup of the group of 
orthogonal nxn matrices. 

The proof of Theorem 9.3.2 not merely establishes the existence 
of a matrix representation for every finite group but describes 
an actual procedure for constructing such a representation. In 
particular, it provides a method for representing groups of permuta¬ 
tions by means of matrices. However, the resulting representations 
are generally unwieldy, the symmetric group requiring, for 
instance, matrices of order n\. It is therefore useful to note that a 
slight modification in the procedure leads to a more convenient 
representation of groups of permutations. 

Theorem 9.3.3. Any group of permutations of degree n can be 
represented by nxn matrices. 

Let G be a group of permutations of degree w, and let p e O be 
the permutation which changes the arrangement (l,...,n) into 
(Ui,..., a^). We can regard p as a function of a single variable such 

p{l) = ai, .... 2j(n) = a„. 

With the permutation p we associate the nXn matrix N(p) defined 
by the equations 

= (W=l .»)■ (9-3.9) 

Thus the first column of N(p) contains a 1 in the a^th place, the 
second column contains a 1 in the agth place, and so on, while all 
the remaining elements are equal to zero. Matrices of this type are 
known as permutation matrices. 

The definition (9.3.9) sets up a biunique correspondence between 
0 and the set O' of matrices N(p). The reader should have no 



GROUPS 


272 


IX, § 9.3 


difficulty in verifying that, for any two permutations p, ? e (?, 
N(p)N(g) = N(pq). 

This implies, in particular, that {N{jp)}“^ = N(p”^). From these 
two relations we recognize that O' is a group and that it is iso¬ 
morphic to O. 


As an example consider the symmetric group which consists 
of the six permutations 

II 2 3\ /I 2 3\ 11 2 3\ (12 3\ (1 2 3\ ^ 2 3\ 

\l 2 3j’ \l 3 2 ;’ \2 1 3 ;’ \2 3 If’ [S 1 2 }’ [3 2 l)’ 

The proof just given shows that 8^ is isomorphic to the group of the 
six 3x3 matrices 

n 0 o\ no o\ /o i o\ /o o i\ /o i o\ /o o i\ 

jo 1 oj, (oolj, (looj, (looj, (0 0 ij, (0 1 oj. 

\o 01/ \o 1 0/ \o 01/ \o 1 0/ \1 00/ \l 0 0/ 


9.4. Groups of singular matrices 

The existence of groups of singular matrices has been indicated 
in § 9.2.1. As an example of such a group, a little less obvious than 
that given in Exercise 9.2.2 (p. 262), we may mention the set of 
matrices of the form 



where x ^ 0. This set is a matrix group, with A(^) as its unit 
element and inverse of A{x). 

We propose now to investigate more systematically groups of 
singular matrices with a view to elucidating the relation between 
the structure of such groups and that of groups of non-singular 
matrices. In the course of our discussion we shall also discover 
certain criteria for deciding whether a given matrix is an element, 
or even the unit element, of a matrix group. It is, at any rate, 
clear that not every matrix is an element of a suitable matrix group. 
For if the matrix 

A = 

were an element of a group F, then so would be A*. Now A® = O, 
and a group which contains the zero matrix can contain no other 
element. 

Exebcise 9.4.1. Prove the last statement. 




IX, §9.4 GROUPS OF SINGULAR MATRICES 273 

Definition 9.4.1. (i) A group matrix is a matrix which is an 
element of at least one matrix group, (ii) A group unit matrix is a 
matrix which is the unit element of at least one matrix group. 

We begin with an almost obvious result. 

Theorem 9.4.1. (i) If A is a group matrix, then so is every 
matrix similar to A. (ii) If A is a group unit matrix, then so is every 
matrix similar to A. 

Let A be an element of a matrix group F and let A' = S^^AS. 
We have to show that A' is also an element of some matrix group. 
Consider the set F' of matrices 

S-iXS (X G F). 

There is clearly a biunique correspondence X S“^XS between 
the elements of F and those of F'. Now 

S-iXS.S-iYS = S-iXYS, 

and hence, if X', Y' e F', then X'Y' g F'. Again, let E be the unit 
matrix of F and let X* be the inverse element of X g F. Then 

EX = XE = X, X*X = XX* = E, 

and so 

S-iES.S-iXS =- S“iXS.S-iES = S-^XS, 
S-iX*S.S-iXS - S-iXS.S-iX*S = S-iES. 

Thus the set F' satisfies all the necessary axioms and so is a group. 
Moreover, A' = S“^AS is an element of this group. 

The second part of the theorem follows in exactly the same way, 
for if A is the unit element of F, then A' is seen to be the unit element 
of F'. 

Exercise 9.4.2. Show that if F is a group of matrices, S a non-singular 
matrix, and F' the group of matrices of the form S~^XS (XeF), then 
F' - F. 

Theorem 9.4.2. The following four statements are equivalent. 

(i) A is a group matrix. 

(ii) i?(A) - i?(A2). 

(iii) A is similar to a matrix of the form 



where A^ is a non-singular square matrix. 

(iv) i?{A) = Ti—mo(A).t 

t The reader is reminded that ?no(A) denotes the number of zero characteristic 
roots of A. 


6682 


T 



274 GROUPS IX, § 9.4 

Write J2(A) = r. If A is similar to the matrix (9.4.1) and Aj is 
non-singular, then A^ is of type rxr. The form (9.4.1) must, of 
course, be interpreted in an obvious sense when r = 0 or r = ti; 
but in these cases the theorem is evidently valid, and we shall 
therefore assume that 0 < r < n. 

Let (i) be given, i.e. let A be an element of a matrix group F. 
Denoting by B the group inverse of A and by E the unit element 


of r, we have 


E = AB, A = AE. 


Therefore, by Theorem 5.6.2 (p. 159), 

JS(E) = i2(AB) < jR(A) = i?(AE) < J2(E), 


and so -R(A) == jR(E). All matrices in F have therefore equal rank, 
and so iZ(A) = i2(A2). Thus (i) implies (ii). 

Next, let (iii) be given. The matrix (9.4.1) is evidently a group 
matrix, since it belongs to the group of all matrices of the form 

(o o) (1^1 

where X is of order r. Hence, by Theorem 9.4.1 (i), A, too, is a 
group matrix; and so (iii) implies (i). 

Again, if A satisfies (iii), then, by Theorem 7.2.1 (p. 199), 
TOo(A) = mo(C), where C denotes the matrix (9.4.1). Hence 

m(,(A) = n—r = n—R(A.), 

i.e. (iii) implies (iv). 

If (iv) is given, then by Theorem 7.6.1 (p. 214) and the corollary 
to Theorem 7.3.3 (p. 202), we have 

i2(A2) ^ n —mQ(A®) = n—mo(A) = i2(A), 

so that i2(A) < ■R(A2). But, by Theorem 5.6.2, i?(A) ^ R{A.^). 
Hence J?(A) = iZ(A*), and (iv) implies (ii). 

To complete the proof of the theorem it is sufficient to show that 
(ii) implies (iii). We denote by U the vector space of all vectors of 
the form Ax (x e SB„), and by VL' the vector space of all vectors 
X 6 which satisfy Ax = 0. Then d(ll) = r, d(U') = n—r; and 
moreover, since R{A) — R(A^), it follows by Theorem 5.6.4 (p. 161) 
(with B = A) that the only vector common to U and U' is 0. 
Hence, by Theorem 2.3.7 (p. 65), U and U' are complements. If, 
then, {Xj,..., X,} is a basis of U and {x,^.i,..., x„} a basis of U', it follows 
by Theorem 2.3.8 that 3£ — {x^.x„ x,.+i,...,x„} is a basis of SB„. 



IX, S 9.4 GROUPS OF SINGULAR MATRICES 276 

Let the linear transformation L of into itself be defined by 
the equation A = (E), where G denotes the basis {Ci,..., e„} 

of Let the matrix B be defined by the equation B = 0t(L\ X).t 

Now, by Theorem 4.2.6 (p. 121), L{x) = Ax, and hence each of 
jD(Xi),..., I((x,) is a linear combination of Xi,...,x„ while 

■^(Xf+l) = - = = 0- 

Moreover, by equation (4.2.13) (p. 121), 

= 0'=i. »»). 

i=l 

where B = (6^^). Hence = 0 for i — 3 = 

andfori = Thus B has the form (9.4.1), and 

the theorem is proved since A and B are similar. 

Theorem 9.4.3. The following three statements are equivalent. 

(i) A is a group unit matrix. 

(ii) A^ = A. 

(iii) A is similar to a matrix of the form 



Here again the form (9.4.3) must be interpreted suitably for r = 0 
and r ^ n, where r = i?(A). In these two cases the theorem is 
true trivially and we shall therefore assume that 0 < r < w. 

If (i) is given, then (ii) follows at once. Again, if A satisfies (ii), 
then, by Theorem 9.4.2, it is a group matrix and so is similar to a 
matrix of the form (9.4.1), where A^ is non-singular and of order r. 
In view of (ii) we have AJ = A^, and therefore A^ = I,.. Thus (ii) 
implies (iii). 

Finally, let (iii) be given. Now the matrix (9.4.3) is obviously the 
unit element in the group of matrices of type (9.4.2). Hence, by 
Theorem 9.4.1 (ii), A is a group unit matrix. Thus (iii) implies (i), 
and the proof is complete. 

Theorem 9.4.4. For r > 0, every group of matrices of rank rX 
is isomorphic to a group of non-singular rxr matrices. 

t In point of fact we know, by Theorem 4.2.8 (p. 121), that B = X~'AX, 
where (i == 1 ..., n). 

X The fact that all matrices of a matrix group have equal rank was established 
in the proof of Theorem 9.4.2. 



276 GROUPS IX, §9.4 

This theorem shows that all results on the structure of matrix 
groups can be obtained from the study of non-singular matrices 
alone. 

Let E be the unit element of a matrix group F. Then, by Theorem 

9.4.3, there exists a (non-singular) matrix S such that 

o)- 

For any A g F we write 

Aj- 

where is of type r x r. Then 

S-iEAS = S-iES.S-iAS = 

S-iAES = S-iAS. S-iES = ® 

yAs O 

S-iAS = S-iAES = S-iEAS, 

Aj = O, Ag = O, A4 = O. 

= (o o)- 

r X r matrix A^ must be non-singular since 
Now S is independent of A and only depends on 
F. Hence, by Exercise 9.4.2, F is isomorphic to a group of matrices 
having the form (9.4.1) and this, in turn, is obviously isomorphic to 
a group of non-singular rxr matrices. Since isomorphism is an 
equivalence relation the theorem is proved. 

Exercise 9.4.3. Deduce Theorem 9.4.4 without making uso of Theorem 

9.4.3. 

9.5. Invariant spaces and groups of linear transformations 

In developing the theory of linear transformations we are bound 
to inquire into the relations that exist between vector spaces and 
their image spaces under such transformations. We propose in the 
present section to study this problem by invariant methods; that 
is, by making use only of the intrinsic properties of linear trans¬ 
formations and not of their matrix representations. The most 
important new concept we shall introduce is that of an invariant 
space. In the course of the discussion we shall also touch on the 
problems considered in § 9.4, but now regard thorn from the point 
of view of invariant properties. 


But 
and so 

Thus 

and here the 
jB(A) = i?(Ai). 



INVARIANT SPACES 


277 


IX, § 9.5 

When we speak below about a transformation (or linear trans¬ 
formation) it is to be understood that we mean a linear transforma¬ 
tion of into itself. All vectors spaces we shall consider are 
subspaces of 93^. If x e is a transformation we shall, for 

the sake of simplicity, write Lx in place of L(x). If U is a vector 
space we shall denote by LVi the image space of U under L, i.e. the 
vector space of all vectors of the form Lx, where x e U. A trans¬ 
formation will be called singular or non-singular according as its 
representing matrices are singular or non-singular. 

Although the theory presented below is stated in terms of vector 
spaces, it might equally well have been stated in terms of linear 
manifolds. Indeed, since components of vectors are never men¬ 
tioned, the difference between statements involving vector spaces 
and those involving linear manifolds is not one of. substance but 
merely of language and notation. 

9.5.1. Definition 9.5.1. The rank R{L) of the transformation L 
is defined by the equation R{L) = d{LSQf). 

Let A = 3i{L\ C). Then Lx = Ax, and hence L93^ consists of 
all vectors of the form Ax, where x e 93,Hence d{L3if) — i?(A), 
and thus the rank of a transformation is equal to the rank of each of 
its representing matrices. 

Definition 9.5.2. If L is a transformation and U a vector space, 
and if LVi = U, then U is said to he an invariant space o/L.t 

Thus a vector space U is said to be an invariant space of L if its 
image space, under L, coincides with the original space. Every 
transformation possesses at least one invariant space, namely, 
that which consists of the zero vector only. 

Definition 9.5.3. If Uis an invariant space of a transformation 
L, and if every invariant space of L is a subspace of U, then U is 
called the maximal invariant space of L. 

The uniqueness of this space (if it exists) is inherent in the 
definition. If the maximal invariant space of L exists, it will be 
denoted by S(L). We shall see presently that it exists, in fact, for 
every L. 


t In the literature the term ‘invariant space* is normally used in a slightly 
different sense. Cf. MacDuffeo, 8, 116, or Julia, 24, 69. 



278 GROUPS IX, § 9.5 

Theorem 9.6.1. Let L be a linear transformation and suppose 
thaty for some non-negative integer Jfc,t 

Then is the maximal invariant space of L. 

Write U = view of our hypothesis we have LVi == U, 

and so U is an invariant space of L, Moreover, if ® is an invariant 
space of Ly i.e. if Zr93 == ®, then Zy^33 = ®. Hence, since ® c 33^, 

93 = L*93 c = U. 

Thus 93 c U, and therefore <Z(L) = U. 

Theorem 9.6.2. Every linear transformation L possesses a 
maximal invariant space. Moreovery there exists a non-negative 
integer = ^o(^) ^ ^ ^o> 

S(L) = 

We have 93^ d L93^ d i^^95^ 3 .... 

Suppose that, for all i > 0, 1^*93^ ^ i*+^93^. Then, in view of 
Theorem 2.3.4 (p. 63), 

n = d(93J > > d(L293J > .... 

Since each of the terms involved is a non-negative integer, this 
infinite chain of strict inequalities cannot be satisfied, and we must 
therefore have, for some k^, 

L*o93^ = L^®+i93^. (9.5.1) 

Hence, by Theorem 9.5.1, L possesses a maximal invariant space, 
which is given by S(i) = Moreover, in view of (9.5.1), 

we clearly have i/*®95^ = ^^93^ for all k ^ k^. The proof of the 
theorem is therefore complete. 

Exercise 9.6.1. Show that 8(L) = 

Exercise 9.6.2. Show that, if is an invariant space of L, then it is 
the maximal invariant space of L. 

Exercise 9.6.3. Show that a linear transformation has 93n as its maximal 
invariant space if and only if it is non-singular. 

The argument above shows that if we consider first the image 
space of 93^ under Ly then the image space of the image space, and 
so on, we obtain a series of progressively contracting spaces, each 
contained in the preceding one. After a certain number of steps 
this process terminates and the renewed application of L does not 


t Ir® is defined as the identical transformation. 



INVARIANT SPACES 


279 


IX, § 9.6 

result in any further contraction. The space reached at this stage 
is the maximal invariant space of L, 

We may also note that, if denotes the least non-negative integer 
satisfying (9.5.1), thenf 

n > d(L®n) > - > = .... 

(9.5.2) 

Now, let A be any matrix and let L be the transformation represen¬ 
ted by A with respect to any given basis in Then A^ represents 
L*, and in view of (9.5.2) we have 

n > R(A) > ... > i?(A*o) == i?(A^‘»+i) = jB(A^»+ 2 ) = 

where the case yfco = 0 (characterizing non-singular matrices) must 
again be interpreted in the obvious way. It follows by Theorem 
9.4.2 that A is a group matrix if and only if fco = ^ or 1. Moreover, 
if A is any matrix and i; is a sufficiently large integer, say k n, 
then A* is a group matrix, since i2(A^) = jB(A^*). 

Theorem 9.5.3. Every linear transformation effects an auto- 
morphism of each of its invariant spaces. 

Let U be an invariant space of the transformation L, so that 
LVi — U. To prove the theorem it is sufficient to show that the 
correspondence x ix of U with itself is biunique. Now, if 
y e U, then (since LU = U) there exists a vector x g U such that 
Lx = y. It remains, therefore, to show that if x,x' g U, x x', 
then Lx ^ Lx'. This means, in fact, that if z g U and Lz = 0, 
then z = 0. To prove this, write d{VL) = r and let {zi,...,Zy} be a 
basis of U. The vectors Lz^,..., Lz^ then obviously span LU = U. 
Hence these vectors constitute a basis of U and so are linearly 
independent. Now, if 

Z = aiZi+...-fa,.Z,., 

then 0 = Lz = Lz^, 

Hence = 0 and so z == 0. The required conclusion 

is therefore established. 

Exercise 9.6.4. Show that if L effects an automorphism of U, then U 
is an invariant space of L. 

Corollary. A linear transformation effects an automorphism of 
its maximal invariant space, 

t For A;o = 0 these relations must be interpreted as 



280 GROUPS IX, §9.6 

Definition 9.5.4. The transforniation L annihilates the vector 
space UifLx = 0 for all x e U. 

Theorem 9.5.4. IfU= maximal invariant space of 

the linear transformation L, then L effects an automorphism of U, and 
L* annihilates one and only one complement of U. 

The first part of the assertion holds by the corollary to Theorem 
9.5.3. To prove the second, write d(U) = r, and let {Xi,...,Xy} be a 
basis of U. Take Xy+i,...,x^ such that the vectors 

x,+i,...,x„ (9.6.3) 

constitute a basis of and put 

= i>*Xi {i = r+l,...,n). 

Then e U (i = r+l,...,ri.). 

Now is, by hypothesis, an invariant space of L. Hence it 

follows that i-®- Thus VL is an invariant 

space of and so, by Theorem 9.5.3, L* effects an automorphism 
of U. Hence there exist vectors z^ such that 

L% = y^, z,. eU {i = r+l,...,7i). 

Put x; = Xi-z^ (i = r+l,...,n) 

and consider the vectors 

Xi,...,x,., x;+i,...,x;. (9.5.4) 

Since z,.+i,...,z^ e U, it follows that every vector in (9.5.3) is a 
linear combination of vectors in (9.5.4). Hence the vectors in (9.5.4) 
constitute a basis of Therefore, in view of Theorem 2.3.8 
(p. 56), the vector space U' spanned by x5._,.i,...,x^ is a complement 
of U. Moreover, for i = r+ly,„,n, 

= yi—Yi = 0 . 

Hence annihilates U'. To show that annihilates no other 
complement of U, let x g 33 ^^ and write 

X = aiXi+-- + «rXr+«r+lXr+l+*- + ^nXn* 

Then L^x = aj L*Xi4-...+a,.2y*x,.. 

But is an automorphism of U, and so L*Xi,..., L^x^ are linearly 
independent. Hence L^x = 0 implies = ... = = 0, i.e. 

X G U'. This completes the proof. 


It may be noted at this stage that Theorem 9.4.2 follows easily 
from the results of the present section. We recall that the critical 



IX, §9.5 INVARIANT SPACES 281 

step in the proof of Theorem 9.4.2 consists in showing that, if 
B(A) = then A is similar to a matrix of type (9.4.1). This 

inference can be made as follows. Let L be defined by the equation 
A=^(L;(£). We then have R{L) = R{L^) and therefore 
L33n = so that (by Theorem 9.5.1), U = is the maximal 

invariant space of L, Hence, by Theorem 9.5.4, L effects an 
automorphism of U and annihilates a certain complement U' of U. 
Let {Xi,...,x,.} be a basis of U and {x'^.i,...,x^} a basis of U'. Then 
= {xi,...,x,., x'+i,...,x'yi} is a basis of and the matrix re¬ 
presenting L with respect to 36 is of type (9.4.1) and is also similar 
to A. 

The next result may be regarded as a converse of Theorem 9.5.4. 

Theorem 9.5.5. Let Vi be a vector space, W a complement of U, 
L a linear transformation, and k a non-negative integer. If L effects 
an automorphism of U and annihilates U', then U is the maximal 
invariant space of L, and U — L^Sn* 

Let X e and write x = y+y', where y e U, y' 6 U'. Then 
L*x = L^y+L^y' = L^y. 

Thus c L^yX\ and since trivially L^U c we have 

L^®,, = L*U. 

But, since L effects an automorphism of U, we know (by Exercise 
9.5.4) that U is an invariant space of L, i.e. LU = U. Hence 
L^Vi — U and therefore 

U = L^®„. 

In exactly the same way we obtain U = and therefore, by 

Theorem 9.5.1, S(L) — U = L^®n- 

We know by Theorem 9.5.2 that each transformation possesses 
a unique maximal invariant space. On the other hand, if a vector 
space U is given, there exist infinitely many transformations having 
U as their maximal invariant space. The procedure for con¬ 
structing all such transformations is as follows. We take an 
arbitrary complement U' of U and an arbitrary non-negative 
integer k, and consider all transformations L having the property 
that L effects an automorphism of U while annihilates U'. Then, 
by Theorem 9.5.5, for each such transformation L we have 
<5(L) = U. The set of all possible transformations of the type 
described (with U' and k varying) is, by Theorem 9.5.4, identical 



GROUPS 


282 


IX, § 9.6 


with the set of all transformations having U as maximal invariant 
space. 

9.5.2. So far we have been concerned with relations between 
transformations and their invariant spaces. A good deal of further 
information can be obtained by considering not individual trans¬ 
formations but groups of transformations. 

Theorem 9.^.6. All ehmenta of a group of linear transformations 
have the same maximal invariant space. 

Let r be a group of transformations, and let L, M eV. We 
know, by Theorem 9.6.2, that there exists a positive integer k such 

e{M) = 

Hence Q{M) c M 35„. 

Now there exists a transformation N eV such that L = M^N. 
Hence 

<o{L) c c = Q(M). (9.5.6) 

Thus <o{L) c <S>{M), and by symmetry <o{M) c <o{L). Therefore 
S(L) = S{if) {L, M 6 r), (9.5.6) 

and the theorem is established. 


In view of the result just proved, we may speak of the maximal 
invariant space of a group F of transformations, and we shall denote 
this space by S(r). 


Theobem 9.6,7. 
L eT, then 


If r is a group of linear transformations and 


s(r) = 


This result is an immediate consequence of (9.6.6) and (9.6.6), 


Theobem 9.6.8, The unit element of a group F of linear trans- 
formations effects the identical automorphism of S(F). 

Denote by E the unit element of F and let x e S(F). Since, by 
Theorem 9.6.3, E effects an automorphism of S(F), there exists a 
vector y e S(F) such that Ey — x. Now E^ = E, and so 


X = Ey — E^ = E{Ey) = Ex. 
The theorem is therefore proved. 


Theobem 9.6,9. Let U &e a vector space and H' a complement of 
U. The set F of all linear transformations which effect avlonwrphisms 
of U and annihilate U' w a group with S(F) = U. 



INVARIANT SPACES 


283 


IX, § 9.5 

We know, by Theorem 9.6.5, that all transformations of the 
given type have U as their maximal invariant space. Hence we 
need only to show that F is a group. 

We first verify that Fis closed under multiplication of transforma¬ 
tions. Let L, M eV. Then LM obviously annihilates U'. More¬ 
over, since L and M effect automorphisms of U, it follows that U 
is an invariant space of both these transformations. Then 
LMVi ~ LVi — U, and so U is an invariant space of LM, Hence, 
by Theorem 9.5.3, LM effects an automorphism of U, and so 
LMeT, 

Next, let E be the transformation in F which effects the identical 
automorphism of U.f If x g we write x = y+y'> where 
y G U, y' G U'. Then 

= Ey+Ey' = y; 

and if jL G F we have, similarly, Lx ~ Ly. But, by Theorem 9.5.5, 
U = L35^, and so Lx g U (x g S^). Therefore 

LLx = Lx = Ly = LLx. 

Thus EL=: LE = L (L g F). (9.5.7) 

Again, denote by L* the automorphism which L effects in U. 
The set of all automorphisms L*, where L g F, is, in fact, the set 
of all automorphisms of U; and by Theorem 9.3.1 (p. 268) this set 
is a group. The unit element of this group is evidently E*, and with 
each element L* there is associated its inverse element (L*)“^. We 
define the transformation L”^ g F by the requirements 

(xeH'). 

If X 6 U, then Lx eU and we have 

L-^Lx = {L*)-^L*x = x — Ex, 

LL-^x — L*(L*)-^x = X = Ex. 

Also, if X G U', then Lx = 0 = L-^x, and so 

L-^Lx = 0 = Ex, LL-^x = 0 = Ex. 

Thus L-iL = LL-i = E (L e T). (9.6.8) 

Equations (9.6.7) and (9.6.8) complete the argument showing that 
r is a group. 

t It should be borne in mind that a linear transformation is determined 
completely if it is defined on two complements. 



284 GROUPS IX, §9.6 

We shall denote the group of the theorem just proved by 

r(U; U'). 

Next, consider any group F of linear transformations. If L g F, 
then, by Theorem 9.5.7, S(F) = Hence, by Theorem 9.5.4, 

L annihilates a certain definite complement of S(F). We therefore 
wish to inquire whether different elements of F annihilate different 
complements of S(F). It is easy to see that this is not the case. 

Theorem 9.5.10. All elements of a group F of linear transforma¬ 
tions annihilate the same complement of S(F). 

Let Ey the unit element of F, annihilate a complement U' of 
S(F). If L G F and x g Vi\ then 

Lx = LLx = LO = 0. 

Thus L annihilates U' and, in view of Theorem 9.5.4, it annihilates 
no other complement of S(F). 

Corollary. Every group of linear transformations having the 
vector space U as its maximal invariant space is a subgroup of one of 
the groups F(U; U'). 

We are now in a position to describe the construction of all 
groups having a preassigned maximal invariant space. 

Theorem 9.5.11. Let Vibe a vector space. The set of all groups 
F(U; U') —where U' is a variable complement of VL—together with all 
their subgroups is identical with the set S(VL) of all groups of linear 
transformations having U as their maximal invariant space. 

If U' is any complement of U, then we know by Theorem 9.5.9 
that F(U; U') and all its subgroups are elements of /S(U). Con¬ 
versely, iiO E S(U)y then, by the corollary just proved, is a sub¬ 
group of one of the groups F(U; U'). 

Theorem 9.5.12. Any group F of linear transformations is iso¬ 
morphic to a group of automorphisms of S(F). 

By the corollary to Theorem 9.5.3 every L eT effects an auto¬ 
morphism, say L*, of S(F). The set F* of all automorphisms L*, 
where L g F, is a group. For if E is the unit element of F, then E* 
is the unit element of F*; and the element defined by the 

equation (L*)-ix = (x e ®(r)), 

is the inverse of L*. Since, then, F* is a group and since obviously 
(LM)* L*M* (L, if e F), 



IX, §9.6 INVARIANT SPACES 286 

it only remains to show that the correspondence L* between 
r and r* is biunique. This will be done by showing that,if L* = Jlf*, 
then L — M. Now L* = Jf* means that 

i*x = if*x (x G S(r)), 
i.e. Lx = Ifx (x G S(r)). 

Hence, for x g S(r), 

L-^Afx = L-^Lx — Ex, 
and so, by Theorem 9.5.8, 

L-^Mx = X (x G S(r)). (9.5.9) 

Now, if X G 93„, then Ex e ESQ^, and so, by Theorem 9.5.7, 
Ex G S(r). Hence, by (9.5.9), L-^MEx — Ex, and so Lx = Mx 
(x G ®^). Thus L — M, and the proof is complete. 

K in the preceding argument we put F = r(lt; U'), then F* 
becomes the group of all automorphisms of S(F). In view of 
Theorem 9.3.1 (p. 268) we therefore have the following result. 

Corollary 1 . If yx is a vector space of dimensionality r, and W 
is any complement of U, then F(U; U') is isomorphic to GL(r). 

As an immediate inference we obtain: 

Corollary 2. If VL is a vector space and U', U" are any two 
complements of XL, then F(U; U') F(ll; H"). 

From Theorem 9.5.12 we may easily deduce Theorem 9.4.4. Let 
F be a group of matrices of rank r, and denote by F' the set of 
transformations represented by these matrices with respect to any 
fixed basis in Then, by Exercise 9.3.2 (p. 270), F' is a group 
isomorphic to F. Moreover, if L g F' and L is represented by 
A G F, then, by Theorem 9.5.7, S(F') = and therefore 

d{(B(r)} = d(L93J = R(A) = r. 

Hence, in view of Theorem 9.5.12, F' is isomorphic to a group F* 
of automorphisms of the r-dimensional vector space S(F'). But, 
by Exercise 9.3.1, F* is isomorphic to a group F" of non-singular 
rxr matrices. Thus F erf F", and Theorem 9.4.4 is established 
once again. 

It is easy to see that not every transformation is an element in 
a group of transformations. By making use of the results of 
§ 9.4 it would, of course, be easy to obtain criteria for deciding 
whether a given transformation is an element (or possibly the unit 



GROUPS 


IX, § 9.6 

element) of some group of transformations. However, we prefer 
to deduce these criteria without appealing to the theory of matrices. 

Theorem 9.5.13. Thefollomng statements are equivalent to each 
other. 

(i) L is an element in a group of linear transformations. 

(ii) R(L)^R(U). 

(iii) Q(L) = m^. 

If (i) is given, let L be an element of a group F. Then, by Theorem 
9.5.7, e{L) = and so m,, = Hence R(L) = R(L^), 

and so (i) implies (ii). 

Next, let (ii) be given, i.e d(L^n) = Then 

and therefore, by Theorem 9.5.1, S(i) = 

Thus (ii) implies (iii). 

Finally, let (iii) be given and write U = Then, by Theorem 
9.5.4, L effects an automorphism of U and annihilates a certain 
complement U' of U. Therefore L is an element of the group 
r(ll; U'). The condition (iii) therefore implies (i), and the theorem 
is proved. 

Exercise 9.6.6. Deduce from the results just proved the equivalence of 
the conditions (i) and (ii) in Theorem 9.4.2 (p. 273). 

Theorem 9.5.14. The linear transformation L is the unit element 
of a group of linear transformations if and only if = L. 

If L is the unit element of a group, then obviously U' = L. If, 
on the other hand, = L,then,byTheorem9.5.13, Lis an element 
in a group F of linear transformations. Hence it possesses an inverse 
element L“^, and so L-^L^ = L”^L, i.e. L = L-^L. Hence L is 
the unit element of F. 

Exercise 9.6.6. Deduce from the result just proved the equivalence of 
the conditions (i) and (ii) of Theorem 9.4.3 (p. 276). 

PROBLEMS ON CHAPTER IX 

1. Show that the group of translations of a plane is isomorphic to the 
additive group of complex numbers. 

2. Show that the additive group of residue classes (modm) is isomorphic 
to the multiplicative group of mth roots of imity. 

3. H is a subset of a group Q. Show that H is a subgroup of G if and only 
if both the following conditions are satisfied: (i) if a, 6 e H, then ab eH; 
(ii) ifaeH, then e H. Verify that conditions (i) and (ii) can be replaced 
by the single equivalent condition: if a, 6 eH, then 06 ““' 6 H. 

Show also that in the case of finite groups condition (ii) can be omitted. 



IX 


PROBLEMS ON CHAPTER IX 


287 


4. H is a subset of a group Q and, for any elements a, h of we write 
a whenever a~^b e H, Show that is an equivalence relation if and 
only if H is a subgroup of Q. When is an equivalence relation, show that 
one equivalence class and one only is a subgroup of Q, 

6. An automorphism of a group Q is an isomorphic mapping of Q into 
itself. Show that the set of all automorphisms of is a group F. Show 
further that, when G' is a Klein four-group, F is isomorphic to the symmetric 
group >^ 3 . 

6. Let G be a group and x a fixed element of Q, The transformation which 
maps a e O into xax~^ is called an inner automorphism of O, Show that an 
inner automorphism is an automorphism, and that the set all inner auto¬ 
morphisms of (7 is a group. 

7. Let bo a group. If a, 6 G G, write a r^b whenever there exists some 
X e O such that b — xax~^. Show that is an equivalence relation, and 
interpret it in the case when G is the full linear group of degree n. 

8. Show that among the n! permutations of degree n precisely \n\ are 
even and that these permutations form a group—the alternating group of 
degree n. Show also that the odd permutations do not form a group. 

9. Show that each of the two sets of matrices 

(;;). (-; j). c -?)• c; i 

G 9. (-; -?)■ (j ;)• G -i). 


is a matrix group, and that the two groups are not isomorphic with each 
other. 

10. Show that the matrices ^ (a ^ 0) constitute a group. 

11. Show that all non-singular matrices which commute with a given 
matrix constitute a matrix group. 

12. Show that the sot of matrices of one or the other of the types 


(o js)’ ip o) (a#0,^7t0) 

is a matrix group. 

13. Let A be a given non-singular matrix. Show that the set G of all 
matrices P such that P^AP = A is a matrix group. Interpret this result 
for the case A = I. 

14. Let F be a matrix group, S a fixed non-singular matrix, and F* the 
group of matrices S-^AS (A G F), Show that, if Q is a matrix such that, 
for all Ag F, A'^SIA — ^2, then, for a suitable matrix and all B g F*, 

= S2'. 

16. Let Fi and Fj be groups of non-singular nxn matrices and suppose 
that every matrix in F^ commutes with every matrix in Fg. Show that the 
set of all matrices of the form (A^ g Fi, Ag e Fg) is again a group. 

Show also that this conclusion need not be valid if the condition of com¬ 
mutativity is dispensed with. 



288 


GROUPS 


IX 


16. Show that if a non-singular matrix is an element of a finite matrix 
group, then its characteristic roots are roots of unity. Show also that the 
cbndition of finiteness cannot be dispensed with. 

17. Show that the characteristic roots of every permutation matrix are 
roots of imity. 

18. Obtain a representation, in terms of real matrices, for the multiplica¬ 
tive group of mth roots of unity. 

19. Obtain a matrix representation for the group of permutations 


n234\ 

/1234\ 

/1234\ 

/1234\ 

ll234r 

I 2143 /' 

l3412^ 

\4321/ 


Also show that this group is isomorphic to the multiplicative group of 
residue classes (mod 8) prime to 8. 

20. Show that the multiplicative group O of the numbers 1, — 1, is 

isomorphic to the multiplicative group of residue classes (mod 5) prime to 
6, and also to the group of permutations 


/1234\ 

/1234\ 

/1234\ 

/1234\ 

U234/’ 

\2341/’ 

\ 34 i 2 r 

Ul23/ 


Obtain a matrix representation of Q in terms of 2 x 2 matrices and another 
matrix representation in terms of 4 x 4 matrices. 

21. Let Ml,..., Mg be the six matrices given on p. 272, which represent the 

symmetric group 8^, By considering the matrices (i = 1,...,6), 

where 

/I -1 n 
T= 1 2 1 . 

\l -1 -2/ 

obtain a representation of in terms of 2 x 2 matrices. 

22 . Let r be a finite matrix group. Show that there exists a hormitian 
matrix H such that, for all A e F, A^HA = H. 

23. Show that the sot of all upper triangular n x n matrices, all of whose 
diagonal elements are equal to 1 , is a matrix group. 

24. Lot O bo the set of all n x n matrices each of which has precisely one 
non-zero element in each row and each column. Show that G is a matrix 
group. 

26. Show that the set B of bilinear transformations w = (ozH- 6 )/(c 2 +d), 
where a, 5, c, d are complex numbers such that ad ^ be, is a group. Discuss 
the relation between B and OL(2), 

26. Let A be an m X n matrix, and denote by ^(P, 0) the operation which 
transforms A into PAO» where P, Q are non-singular matrices of order 
m, n respectively. Show that the set of all operations <(P, O) is a group. 
Deduce that the set of equivalence transformations (in the sense of Definition 
6.2.3, p. 177) is a group. 

27. Show that, if A 9 ^= O is an element of a matrix group, then no positive 
power of A can be equal to O. 



IX PROBLEMS ON CHAPTER IX / 289 


28. Show that 

/I 

0 

0 

0 



/ ® 

1 

0 

0 

o\ 


1 ® 

0 

0 

1 

0 


\0 

0 

0 

0 

1 / 


\o 

0 

0 

0 

0/ 


is not a group matrix. 

29. Let A and B be square matrices such that i2(A®) = JR(A), 

R(B^) = Show that i2(ABAB) = jR(AB). 

30. L is a linear transformation of iBn such that == 5^1 is 

the subspace of annihilated by L. Show that L93n ^ comple¬ 
ments. 

31. L is a linear transformation of 33^; U is an r-dimensional invariant 

space of L; and U' is a complement of U. Show that, if are linearly 

independent vectors in U and are linearly independent vectors in 

U', then the matrix representing L with respect to the basis 

{Xi,...,Xy, x,^i,...,x„} 


is of the form 



where A^ is a non-singular rxr matrix. 


32. Show that any matrix A is similar to'a matrix of the form 



where A^ is non-singular and, for some value of A;, A| = O. Show further that 
A is a group matrix if and only if Aj — O. 


33. Show that, if the matrices A^,..., A^ constitute a matrix group, then 


kR(A^-{-Aj^) — tr(Ai-{-...-f-A;t)* 


6582 


V 



X 


CANONICAL FORMS 

10.1. The idea of a canonical form 

10.1.1. We have already had occasion to note the importance of 
similarity transformations-f We shall now investigate the manner 
in which given matrices can be changed by similarity transforma¬ 
tions into matrices of particularly simple types. Thus, given a 
matrix A, we shall consider the transformed matrix S-^AS and 
shall try to choose S in such a way that S-^AS is as simple as 
possible. The significance of this procedure in terms of linear 
mappings is clear in view of Theorem 4.2.5 (p. 119). When a 
matrix A is given we interpret it as the matrix representing the 
linear mapping L (of into itself) with respect to the basis 
® and we attempt to find a second basis S with 

respect to which L is represented by a matrix having a simple form. 
The discussion below provides no more than a slight sketch of a 
large and important subject, since a more systematic treatment 
would fall outside the scope of the present work.J 

Definition 10.1.1. Let A be a given matrix. If there exists a 
{non-singular) matrix S such that S^^AS = A is a diagonal 
{triangular) matrix, then A is called a diagonal (triangular) 

CANONICAL FORM OF A UNDER THE SIMILARITY GROUP. 

Thus A possesses a diagonal (triangular) canonical form under 
the similarity group if and only if it is similar to a diagonal (tri¬ 
angular) matrix. 

It is often desirable to operate not with the entire similarity 
group but with certain of its subgroups. 

Definition 10.1.2. Let A be a given matrix. If there exists a 
unitary matrix S such that S“^AS = A is a diagonal {triangular) 
matrix, then A is called a diagonal (triangular) canonical 

FORM OF A UNDER THE UNITARY SIMILARITY GROUP. 

t See § 4.2. 

j For further information see Sohreier and Spemer, 4 , chap, v; MacDufPee, 8, 
chaps, vi-viii; Halmos, 14 , Appendix I; Ferrar, 15 , chap, iv; Turnbull and Aitken, 
19 , chap, vi; Jacobson. 20 , chap, in; Hamburger and Grimshaw. 21 , 122-32; 
MacDuffee, 22 , chaps, iv-vi; MacDuftee, 23 , 240-2; Albert, 24 , chap, iv; Wedder- 
bum, 28 , chap. iii. 



X,§10.1 THE IDEA OF A CANONICAL FORM 291 

An analogous terminology applies if the term ‘orthogonal’ is 
substituted for ‘unitary’. 

The invariant interpretation of orthogonal and unitary canonical 
forms is, of course, clear in view of Theorem 4.2.8 (p. 121). If L is 
the linear transformation represented by A with respect to the 
orthonormal basis G, then we seek a new orthonormal basis of 
S3^ with respect to which we wish L to be represented by a diagonal 
or a triangular matrix. 

Fundamentally, the study of canonical forms is the study of 
intrinsic properties of matrices, and the purpose of obtaining a 
canonical form A of A is to have a matrix at our disposal which 
shall be easier to manipulate than A but possess the same invariant 
characteristics. We have, in fact, 

A = SAS-i, (10.1.1) 

where A and S are subject to certain prescribed restrictions. The 
representation of A in the form (10.1.1) is very convenient for 
many purposes. Thus, for instance, it may enable us to determine 
the rank of A, since i2(A) = Ii(A), and A can generally be dealt 
with more easily than A.f A similar use can be made of (10.1.1) for 
the evaluation of the trace. Again, if / is a polynomial, then 

/(A) = S/(A)S-i. 

This identity is indispensable for dealing with many problems which 
might otherwise prove insoluble. J 

In investigating canonical forms we deal with representations of 
matrices in certain standard forms which exhibit structure and 
facilitate calculation. This procedure may be compared with the 
representation of integers as products of powers of primes or the 
representation of conics in terms of their standard equations. 

In the present chapter we are concerned with canonical forms 
under similarity transformations but it will be recalled that an 
analogous problem for congruence transformations was considered 
in § 6.4, where it was found that if A is symmetric, then a non¬ 
singular matrix P can be found such that P^AP is diagonal. The 
study of ‘canonical forms under congruence transformations’ will 
be resumed in Chapter XII. 

t See, for example, the proof of Theorem 10.2.3. 
t See, for example, the proof of Theorem 11.1.1. 



292 


CANONICAL FORMS 


X, § 10.1 


10.1.2. Two simple results may be noted at once. 

Theorem 10.1.1. If A is any canonical form of A, then the 
diagonal elements of A are the characteristic roots of A.f 

By Theorem 7.2.1 (p. 199) A and A have the same characteristic 
roots, and since A is diagonal or triangular its diagonal elements 
are precisely its characteristic roots. 

Exercise 10.1.1. Let A be similar to a diagonal matrix. Show that all 
characteristic roots of A are equal if and only if A is a scalar matrix. 

Theorem 10.1.1 naturally leads us to inquire in what order the 
characteristic roots of A appear on the diagonal of a canonical 
form. The next theorem shows that for diagonal canonical forms 
this order may be preassigned arbitrarily. 

Theorem 10.1.2. Suppose that A possesses a diagonal canonical 
form A under a group O of transformations. If A' is a diagonal matrix 
whose diagonal elements are the characteristic roots of A in any pre¬ 
assigned order y then A' is also a diagonal canonical form of A under O. 

We have, by hypothesis, 

A = SAS-i, 

where S is of a certain prescribed type (non-singular, unitary, or 
orthogonal). In view of Theorem 10.1.1, A and A' differ at most in 
the arrangement of their diagonal elements. Hence, by Theorem 
7.2.2 (p. 199), there exists a matrix T such that 

A = TA'T-i, 


and it is, in fact, clear from the proof of Theorem 7.2.2 that T is 
orthogonal. Thus _(ST)A^(ST)”'^ 

and A' is therefore a canonical form of A under O. 


Theorem 10.1.2 remains true (with obvious verbal adjustments) 
for triangular canonical forms. The argument just used is no 
longer effective in this case, but the required conclusion will be 
seen to be an obvious consequence of the proof of Theorem 10.4.1. 

10.2. Diagonal canonical forms under the similarity group 
10.2.1. We begin with a theorem which not only asserts the 
existence of diagonal canonical forms for certain types of matrices 

t It is, of course, implied in this statement that eekch characteristic root of A 
occurs eunong the diagonal elements of A with its correct multiplicity. 



X,§10.2 DIAGONAL FORMS UNDER SIMILARITY GROUP 293 

but provides an explicit method for arriving at such canonical 
forms. 

Theorem 10.2.1. If x^ are linearly independent charac¬ 
teristic vectors of an nxn matrix A, and S is the (non-singular) 
matrix having Xi,...,x^ as its columns, then S~^AS is a diagonal 
matrix. 

Denote by a>^ the value of the characteristic root of A with which 
the vector Xy is associated.f Thus 

AXy = O^yXy (j = l,...,^)*, (10.2.1) 

and using Theorem 3.3.5 (hi) (p. 84), we obtain 

(S-IAS),.,. = (S-l),.* AS,^y = (S-l)y,,AXy = (S'l),.,, O^yXy 

= a)j(S~^)i^S^j = (Oj(S~^S)ij = a>yl^y = COyS^y. 

Hence S~^AS = dg(a>i,...,a>,j), (10.2.2) 

and the theorem is proved. We may arrive at the same conclusion 
without any calculation by interpreting A as the matrix re¬ 
presenting a linear transformation L with respect to the basis G. 
Since Xi,...,x^ are linearly independent, 3£ = {Xi,...,x^} is a basis 
of 33;^ and, by Theorem 4.2.8 (p. 121), the matrix S-^AS represents 
L with respect to 3£. We have, by Theorem 4.2.6 and (10.2.1), 

LXy = W^Xy (j = l,...,n). 

Moreover, if S“^AS = (6^.y), then, by (4.2.13), 

LXy == |;6^yXy (j = 1,...,^). 
i*=*l 

Hence 6^ = (i,j = and this is equivalent to (10.2.2). 

Consider, for instance, the matrix 

/O 1 0 
A = jo 0 1 

\l 0 0 

whose characteristic roots are 1, p, p^, where p = K (*, y, z)^ 

is a characteristic vector associated with the characteristic root A, 

X{x, y, z)^ = A(x, y, zY = {y, z, xY. 

t We are not entitled to assume, that the numbers <■>„ give us all the 

characteristic roots with correct multiplicities. This is, however, the case as is seen 
by (10.2.2.) 


(10.2.3) 



294 CANONICAL FORMS X, § 10.2 

Hence (1,1,1)^, (l,p^,p)^ are three characteristic 

vectors associated with l,p,p^ respectively. It is easy to verify that 
these vectors are linearly independent. Hence the matrix 



is non-singular, and S”^AS = dg(l,p,p2). 

Exebcise 10.2.1. Write down matrices T and U such that 

T-iAT = dg(l, p^ p), U-iAU = dg(p2, p, 1), 

where A is the matrix defined by (10.2.3). 

Theorem 10.2.2. An nxn matrix is similar to a diagonal 
matrix if and only if it possesses n linearly independent vectors. 

An alternative way to express this result is to say that an nxn 
matrix is similar to a diagonal matrix if and only if its characteristic 
vectors span the total vector space 

In view of Theorem 10.2.1 we need only prove that if A is similar 
to a diagonal matrix, then it possesses n linearly independent 
characteristic vectors. Suppose, then, that 

S-iAS = A - dg(Ai,...,AJ, 

where A^,..., A^ are, of course, the characteristic roots of A. Then, 
for i = l,...,n, (AS),„^ = (SA),^^, and so 

AS,,,^ = SA,,,^ = A^Se^ = A^-S,ic^. 

The columns of S are therefore characteristic vectors of A; and 
since |S| 0, they are linearly independent. 

Exercise 10.2.2. Prove Theorem 10.2.2. by an invariant argument. 

Definition 10.2.1. A characteristic root of a matrix is regular 
if its multiplicity is equal to the maximum number of linearly inde¬ 
pendent characteristic vectors associated with it. 

Thus, a i;-fold characteristic root A of A is said to be regular if 
i?(AI—A) = n—k. 

It will be recalled that, by the rank-multiplicity theorem (p. 214), 
we have in every case 

jB(AI—A) ^ n-—hi 

Theorem 10.2.3. A matrix is similar to a diagonal matrix if 
and only if all its characteristic roots are regular. 

This theorem states, in fact, that A is similar to a diagonal 
matrix if and only if ii{AI—* A) = n—m;^{A) for every value of A. 



X,§10.2 DIAGONAL FORMS UNDER SIMILARITY GROUP 295 
Suppose, in the first place, that A is similar to a diagonal matrix, 
say S-1AS = A = dg(A„...,A„). 

If A is a ifc-fold characteristic root of A, then precisely k among the 
numbers Ai,...,A^ are equal to A; and we have 

i?(AI~A) = i?{S(AI-A)S-i} = i?(AI-A) 

= ^(dg(A—Ai,...,A—A„)} = n—k. 

Thus every characteristic root of A is regular. 

Suppose, next, that all characteristic roots of A are regular. 
Denote the distinct characteristic roots by and let ri,..., 

be their multiplicities, so that — n. For every i in the 

range 1 i < m there exist, by hypothesis, linearly independent 
vectors {j = l,.“,^i) such that 

(i = j = l,...,rj. (10.2.4) 

We now assert that the n characteristic vectors (i = l,...,m; 
j = l,...,r^) are, in fact, linearly independent. To prove this, let 
tij be scalars such that 

2 = 0 ; 

ij 

here i ranges over the values l,...,m and, for each i, j ranges over 
the values l,...,^t* Let k be any number such that 1 < /c < m. 
Then, using (10.2.4), we obtain 

m m 

0 = n KI-A) 2 = I hj n (a>fcI-A).x('>) 

fc = l i,j i,j k—1 

k^K k/K 

m 

k^K 

Now the product in the expression on the right-hand side certainly 
vanishes if /c ^ i. Hence 

tk m 

0 = 2 n 

j=l k^l 
k^K 

But the vectors {j — l,...,r,^) are linearly independent, and 
therefore ^ 

^Ki n = 0 (j = 

A: = l 

k^K 

Since the co’s are distinct, it follows that = 0 {j = l,...,r^). 
This holds for every k in the range 1 /c < m; and therefore the 
n characteristic vectors x^^^^ are linearly independent. Hence, by 
Theorem 10.2.2, A is similar to a diagonal matrix. 



296 


CANONICAL FOBMS 


X, § 10.2 

Corollary. Suppose that A possesses a diagonal canonical form 
under the similarity group. Then the number of its zero characteristic 
roots is equal to n—R{h). 

For, by Theorem 10.2.3, -R(A) = 1?(—A) = n—mo(A), and the 
assertion follows at once. 

Exebcisb 10.2.3. Show that A is similar to a diagonal matrix if and only 
if Al— A is a group matrix for every value of A. 

The criteria of Theorems 10.2.2 and 10.2.3, though valuable 
theoretically, are of very little use in practice; and it is, therefore, 
important to determine some easily recognizable classes of matrices 
which possess diagonal canonical forms. The next theorem deals 
with one such class; others will be found in § 10.3. 

Theorem 10.2.4. If the n characteristic roots of the nxn matrix 
A are distinct, then A is similar to a diagonal matrix. 

The proof is immediate for, since every characteristic root of A 
is simple, every one is regular by the corollary to Theorem 7.6.1 
(p. 216). Hence the assertion follows by Theorem 10.2.3. We shall, 
however, also give an easy alternative proof. 

Let Aj,..., A^ be the n (distinct) characteristic roots of A, and let 
Xi,...,x^ be any characteristic vectors of A associated with A^,..., A^ 
respectively. Thus 

Ax^ = (j = 1. n). 

Hence, for ^ > 1, 

= A*-^Ax^ = A,A*-^Xy, 

and so A% = {k >0;j= 1,..., n). (10.2.5) 

Next, suppose are scalars such that 

2 ocjXj == 0 . 

Premultiplying by A*' and using (10.2.5) we obtain 

= 0 (fc = 0, 

and this may be written as 



X,§10.2 DIAGONAL FORMS UNDER SIMILARITY GBOUP 297 
where ^ and = a^Xj. Now 


aoi 

• ®0n 


1 

1 

. 

1 

an . . 

^In 

= 

Ai 

Ag 

• 


^71-1,1 



Af-i 

A?-! . 

• 

An-1 


and since the A’s are distinct, this determinant does not vanish in 
view of equation (1.4.5) on p. 17. Hence, by Exercise 5.1.2 (p. 135), 

Yl = ••• = Yn = 0, 

i.e. = ... = a„x„ = 0. 

But Xi,...,x„ are non-zero vectors by hypothesis. Hence 
= ... = = 0, and so Xi,...,x„ are linearly independent. The 

required result now follows by Theorem 10.2.1. 

Exercise 10.2.4. Give a proof of Theorem 10.2.4 based on the same 
procedure as the proof of Theorem 10.2.3. 

1 0.2.2. A further criterion for the existence of diagonal canonical 
forms establishes a link between canonical forms and minimum 
polynomials. 

Theokem 10.2.5. A matrix is similar to a diagonal matrix if and 
only if the linear factors of its minimum polynomial are distinct. 

We shall denote the matrix in question by A. Let be 

its distinct characteristic roots and r^,.,,^rj^ their respective 
multiplicities. 

Suppose that A is similar to a diagonal matrix so that, for some 
S, S~^AS = dg(ai,..., a^), where, of course, each a is equal to some 
60 . Denoting by ^(A) the polynomial 

^(A) = (A—a)i)...(A—Wfc) 

we have ^(A) = S.dg{j[x(Q(i),...,yx(a„)}S-i = O. 

Hence, in view of Theorem 7.4.7 (p. 208), /it(A) is the minimum 
polynomial of A, and its linear factors are evidently distinct. 

Next, suppose that /t(A), as defined above, is the minimum poly¬ 
nomial of A. Then 

(ojil—A)...(a)fcl—A) = /i{A) = O, 
and by repeated application of Theorem 5.6.5 (p. 162), we infer that 

AX (Jfc—l)n. (10.2.6) 




298 CANONICAL FORMS X, § 10.2 

Now we know that 

A) > n—r^ (i — 

Assume that A is not similar to a diagonal matrix. Then, by 
Theorem 10.2.3, we have, for some k, 

R{(jo^I—A) > n—r^. 

k 

Hence 2 — {k—\)n^ 

and this is contrary to (10.2.6). The proof of the theorem is therefore 
complete. 

A matrix A may be said to be^enodtc if, for some fc ^ 1, A* = I. 
We already know by Exercise 7.3.2 (p. 202) that if a matrix is 
periodic, then its characteristic roots are roots of unity, but that 
the converse of this is false. Necessary and sufficient conditions for 
periodicity of matrices can now be deduced. 

Theorem 10.2.6. A matrix is periodic if and only if it is similar 
to a diagonal matrix and has all its characteristic roots eqiml to roots 
of unity. 

Suppose that A^ = I, where A: > 1. Then, writing/(A) = A^—1, 
we have/(A) = O. Hence, by Theorem 7.4.2 (p. 204), the minimum 
polynomial p(X) of A divides/ (A). Now the linear factors of/ (A) are 
distinct, and so are therefore the linear factors of Hence, by 
Theorem 10.2.5, A is similar to a diagonal matrix. Moreover, every 
characteristic root of A is, of course, a fcth root of unity. 

Next, suppose that S“^AS = A, where A is a diagonal matrix, 
and that each characteristic root of A is a root of unity. Then there 
exists an integer k'^ I such that each characteristic root of A is a 
ktYi root of unity. We therefore have 

S-iA^S = (S-iAS)^ = A* = I; 
hence A* = I, and the theorem is proved. 

10.2.3. One of the many applications of diagonal canonical 
forms is in the study of matrix equations. Thus, for instance, we 
might ask for solutions of the equation 

X2 = A, 

where A is a given matrix. Any such solution may, by analogy with 
the corresponding problem in scalar analysis, be referred to as a 



X,§10.2 DIAGONAL FORMS UNDER SIMILARITY GROUP 299 

square root of A. However, while every complex number possesses 
at least one square root, there exist matrices which have none. For 
example, it is easy to verify that the equation 

^■ = (o J) 

has no solution. 

A more general problem is concerned with the equation 

/(X) = A, (10.2.7) 

where / is any given polynomial. We are naturally interested in 
conditions which will ensure the solubility of this equation. 

Theorem 10.2.7. If A is similar to a diagonal matrix and f is a 
non-constant 'polynomial, then the equation (10.2.7) is soluble for X. 

Let S be a matrix such that 

A = S.dg(Ai,...,AJ.S-i, (10.2.8) 

and let be any (complex) numbers such that 

/(Mi) = \, f(f^n) = K- 

Then the matrix X == S .dg(/[>ti,..., clearly satisfies (10.2.7). 

Each fjLj. in the above proof can, in general, be chosen in k distinct 
ways, where k is the degree of/. Hence our construction yields, in 
general, k^ solutions of (10.2.7). This reasoning shows, in particular^ 
that a non-singular matrix with distinct characteristic roots 
possesses at least 2^ distinct square roots. More precise informa¬ 
tion is given by our next result. 

Theorem 10.2.8. An nxn matrix with distinct characteristic 
roots possesses precisely 2^ or 2^-^ distinct square roots according as it 
is non-singular or singular. 

The given matrix A is similar to a diagonal matrix by Theorem 
10.2.4 and may, therefore, be written in the form (10.2.8). Then 
every matrix 

X = S.dg(VA„..., VAJ.S-i (10.2.9) 

satisfies the equation X® = A. If A is non-singular, all A’s are 
different from 0 and there are 2" such matrices; if A is singular, 
then precisely one A is equal to 0 and the number of matrices is 2”~^. 
It remains to show that there are no other solutions of X* = A. 
Now, if X is any solution, we have 

(S-^XS)2 = S-iX^S = dg(Ai,...,A„) = A, 



300 


CANONICAL FORMS 


X, § 10.2 


say. Hence 

(S~1XS)3 = (S~1XS)A == A(S“1XS). 

Since S~^XS commutes with a diagonal matrix A whose diagonal 
elements are distinct, if follows by Exercise 3.3.4 (p. 81), that 
S~^XS == M, where M is diagonal. But 

M2 = S’iX2S = A, 
and therefore M = dg(VAi,..., VA^), 

when suitable values of the square roots are taken. Thus X is of 
the form (10.2.9), and the proof is complete. 

When the characteristic roots of A are not all distinct, various 
possibilities may occur. We have already seen that there exist 
matrices which possess no square roots. There also exist matrices 
which possess infinitely many. Thus every matrix of the form 

L) 

satisfies the equation X2 and so Ig has infinitely many square 
roots. 

10.3. Diagonal canonical forms under the orthogonal 

similarity group and the unitary similarity group 

10.3.1. We begin this section by proving some rather special 
results which will presently be superseded. 

Theorem 10.3.1. If an nxn matrix A possesses a set of complex 
(real) orthoriorrnal characteristic vectors Xi,...,x^, and S is the uniUiry 
{orthogonal) matrix having Xi,...,x^ as its columns, then S“^AS is a 
diagonal matrix. 

The vectors Xi,...,x^ are orthonormal and therefore linearly 
independent. The assertion now follows at once by Theorem 10.2.1. 

Exebcise 10.3.1. Discuss the fallacy in the following argument. ‘Let 
Xj,..., x,j be linearly independent characteristic vectors of A. The space 
spanned by the characteristic vectors is then the total vector space 25 n» 
we can select an orthonormal basis {xj[,..., x'} of 23n consisting of charac¬ 
teristic vectors of A. Hence, by Theorem 10.3.1, it follows that every n xn 
matrix which possesses n linearly independent characteristic vectors is 
unitarily similar to a diagonal matrix.* 

Theorem 10.3.2. If A is a hermitian rmtrix, then any two 
characteristic vectors associated with two distinct characteristic roots 
of A are orthogonal to each other. 



X.§10.3 DIAGONAL FORMS UNDER SIMILARITY GROUPS 301 

Let A, fi be two distinct characteristic roots of A, and let x, y be 
vectors such that Ax = Ax, Ay == fiy. Since, by Theorem 7.5.1 
(p. 209), A and fju are real, we have 

Ax^y = (^)^y = (^)^y = x^A^y = x^Ay == jitx^y. 

Now A 7 *^ /X, and therefore (x, y) = x^y == 0. 

Theorem 10.3.3. Any hermitian {real symmetric) matrix with 
distinct characteristic roots is unitarily {orthogonally) similar to a 
diagonal matrix. 

Let A^,..., A^ be the (distinct) characteristic roots of a hermitian 
or real symmetric matrix A, and denote by Xi,...,Xy^ a set of unit 
characteristic vectors of A, associated respectively with Ai,...,A,^. 
When A is real and symmetric, real vectors can, of course, be 
chosen. Now, by Theorem 10.3.2, Xi,...,x^ constitute an ortho- 
normal set, and the assertion therefore follows by Theorem 10.3.1. 


We now possess an actual procedure for transforming to diagonal 
form certain hermitian (real symmetric) matrices by means of a 
unitary (orthogonal) similarity transformation. Consider, for 
example, the real symmetric matrix 


A = 


I 0 5 



Its characteristic polynomial is 


A—1 0 4 

0 A—6 -4 

4 —4 A—3 


= A3-9A®-9A+81 = (A-3)(A+3){A-9), 


and the characteristic roots are thus 3, —3, 9. Now if (a;, y, z)^ 
is a characteristic vector associated with the characteristic root A, 



i.e. z—4z — Aa;, 

5y+4z = Ay, 
—4a:+4y+3z = Az. 



302 


CANONICAL FORMS 


X, § 10.3 

From this it is easily found that unit characteristic vectors asso¬ 
ciated with the characteristic roots 3, —3, 9 may be taken as 

(§.§,-if. (I.-iff. ih-h-ir 

respectively. Consequently the orthogonal matrix 



has the property that S"^AS = dg{3, —3, 9). 

10.3.2. We next turn to the more general problem in which 
the distinctness of characteristic roots is not presupposed. For the 
sake of simplicity we first deal with the real case, and in the proof 
of the next theorem we employ a type of argument that is found 
particularly effective in the discussion of canonical forms.f This 
argument will be used again for proving a number of subsequent 
results. 

Theorem 10.3.4. (Orthogonal reduction of real symmetric 
matrices) 

Every real symmetric matrix is orthogonally similar to a diagonal 
matrix. 

This theorem contains, of course. Theorem 10.3.3 in so far as the 
latter refers to real symmetric matrices. We shall first give an 
existence proof of Theorem 10.3.4 and then describe the process 
for constructing the transforming matrix. 

The proof is by induction with respect to n. For n = \ the 
assertion is true trivially. Assume, then, that it is true for n—1, 
where n'^ 2, We shall show that it is then true for n. 

If Ai is a characteristic root (necessarily real) of A, then there 
exists a real unit vector such that 

Axi = A^Xj. (10.3.1) 

Let S be an orthogonal matrix with as its first column. J Then, 
using (10.3.1), we obtain for r = l,...,n, 

(S-iAS),i = (S-i),,, AS„i = (S-i),„Axi 
= Ai(S ^)y,nX2 = A2(S“^),.,^ 

= Ai(S-iS)rt = AiIrt = Ai8rt. 

t A similar idea was used in § 6.4. 

} Such a matrix exists by Theorem 8.1.3 (p. 223). 



X,S10.3 DIAGONAL FORMS UNDER SIMILARITY GROUPS 303 
Moreover, since A is symmetric, so is S-^AS = S^AS. Hence 
(S-iAS)i, = (r 

The matrix B = S~^AS thus has the form 


B = / A, orn 

\Oi-^ B, j’ 


where is a real symmetric matrix of order n—\. Now, by the 
induction hypothesis, there exist an orthogonal matrix and a 
diagonal matrix A^, both of order n—1, such that B^Ci = A^. 

Hence 0 \_/l OWA^ 0\ 

[o bJ[o cJ-\o cJ[o aJ’ 

or say BG = C A. Here A and C are both of orders; A is obviously 
diagonal and C is orthogonal by Exercise 8.2.2 (p. 230). We thus 

s-ias.g = £:a, 

(SC)-IA(SC) = A; 

and since SC is orthogonal, the required result is established. 


Although Theorem 10.3.4 deals with a rather restricted class of 
matrices it will be seen, in § 12.2, to be the appropriate instrument 
for the study of quadratic forms, where it is precisely transforma¬ 
tions of real symmetric matrices that are relevant. 

Let us now consider how an orthogonal matrix S may be con¬ 
structed such that S~^AS is diagonal. Denote the distinct charac¬ 
teristic roots of A by Ai,..., and let m^,..., nif^ be their multiplicities, 
so that = n. Since A is real and symmetric it is, by 

Theorem 10.3.4, similar to a diagonal matrix. Hence, by Theorem 
10.2.3 (p. 294), the real vectors x satisfying the equation Ax = A^x 
constitute a vector space of dimensionality Denote by an 
orthonormal basis of this space. Hi j, then, by Theorem 10.3.2, 
any vector in is orthogonal to any vector in Thus the n 
vectors contained in S^,..., form a (real) orthonormal set. Hence, 
by Theorem 10.3.1, the orthogonal matrix S having these vectors 
as its columns possesses the required property.f 

The preceding argument establishes, in particular, the following 
result. 


t For a numerical illustration of the procedure described here, see 112.2.2. 



304 


CANONICAL FORMS 


X, § 10.3 

Theorem 10.3.6. If A is a real symmetric matrix having charac¬ 
teristic roots Ai,...,A^, then there exist characteristic vectors Xi,...,x^ 
of A which are associated with A^,..., A^ respectively and which con¬ 
stitute an orthonormal set. 

Alternatively, we may infer this result from Theorem 10.3.4 by 
observing that if S”^AS is diagonal, then the columns of S are 
characteristic vectors of A associated with Ai,...,A^ respectively. 

We also note that the converse of Theorem 10.3.4 is true. This 
may be expressed more precisely as follows. 

Theorem 10.3.6. If a real matrix is orthogonally similar to a 
diagonal matrix^ then it is symmetric. 

For, if A is real, S orthogonal, A diagonal, and S-^AS = A, then 
A = SAS“i = SAS^’, 
and so A^ = A, as asserted.f 

We next consider complex matrices. A complex symmetric 
matrix need not be similar to a diagonal matrix. Thus, if the matrix 



(whose characteristic roots are 0,0) were similar to a diagonal 
matrix, we would have S“^AS = O, i.e. A = O. However, a valid 
analogue of Theorem 10.3.4 for complex matrices is not difficult to 
obtain. 

Theorem 10.3.7. Every hermitian matrix is unitarily similar to 
a diagonal matrix. 

This result contains, of course, Theorem 10.3.3 in so far as that 
theorem relates to hermitian matrices, but it does not contain 
Theorem 10.3.4. However, the proof is virtually the same as that 
of Theorem 10.3.4, and we shall indicate it rather briefly. 

Let Ai be a characteristic root of the hermitian matrix A, and let 
Xi be a unit (complex) characteristic vector associated with A^. 
If U is a unitary matrix with x^ as its first column, then 

(U““^AU),.i = A^Syj^ (r = l,...,9i). 

Since, by Exercise 8.2.1 (p. 229), U-^AU is hermitian, it follows 

that (U-iAU),, = Ai8,i 

t The condition that A is real has not, in faot, been used and can be omitted 
from Theorem 10.3.6. 



X,§10.3 DIAGONAL FOEMS UNDER SIMILARITY GROUPS 306 

Thus U-.AU - (;■ »J, 

where is a hermitian matrix of order 1. The proof is now 
completed by induction with respect to n. 

Exercise 10.3.2. Let ctii,..., be the distinct characteristic roots of the 
hermitian nxn matrix A. Show that the minimum polynomial of A is 
(A—a)i)...(A—cofc). 

If A is a given hermitian matrix, then the construction of a 
unitary matrix S such that S“^AS is diagonal proceeds by virtually 
the same steps as those described on p. 303. In the present case we 
again obtain a set of n orthonormal characteristic vectors of A, 
and the matrix having these vectors as its columns is the required 
matrix S. 

Exercise 10.3.3. State and prove the analogue of Theorem 10.3.5 for 
hermitian matrices. 

10.3.3. It is possible to give a far-reaching generalization of 
Theorem 10.3.7. With this end in view we introduce the notion of 
a normal matrix. 

Definition 10.3.1. A {complex) matrix A ia normal if 
A^A = AA^. 

Various types of matrices with which we are already familiar are, 
in fact, normal. Among these are diagonal matrices, unitary 
matrices, hermitian matrices, and skew-hermitian matrices. 

Exercise 10.3.4. Verify that the property of normality is invariant 
under unitary similarity transformations, i.e. if A is normal and U is unitary, 
then U~^AU is normal. 

The following necessary and sufficient condition for the existence 
of diagonal canonical forms under the unitary similarity group was 
found by Schur and Toeplitz in 1910. 

Theorem 10.3.8. A matrix is unitarily similar to a diagonal 
matrix if and only if it is normal. 

This result contains Theorem 10.3.7 (but not Theorem 10.3.4) 
as a special case. 

If A is a given matrix and U“^AU = A, where U is unitary and 
A diagonal, then, by Exercise 10.3.4, it follows that A is normal. 

Suppose, on the other hand, that A is normal. L^t A^ be a 
characteristic root of A, a unit characteristic vector associated 
with Aj, and U any unitary matrix having as its first column. By 

6682 X 



CANONICAL FORMS 


306 CANONICAL FORMS X, § 10.3 

precisely the same argument as that employed in the proof of 
Theorem 10.3.4, we have 

Hence V = U“^AU may be written in the form 


(oLi 9’ 


where y is a (column) vector of order n— 1, and a square matrix 
of order n—\. Since 


\y Bfj’ 


we have 


y 




VV2’ - 

B,y bM 


But V is normal; hence = A^Ai+y^y and so y = 0^_i. 
Therefore BfBi = BiB’’ and, furthermore, V may be written in 
the form /n ^ v 

"”(o b) 

where is a normal matrix of order n—l. 

The proof is now completed by induction with respect ion. If the 
theorem holds forn—1, then there exists a unitary matrix and 
a diagonal matrix A^, both of order n—l, such that B^ A^. 

Hence » qwj 0\ _ (I OUAi 0\ 

\o bJ\o cJ-\o cJ\o aJ’ 

or say VC = CA, where A is diagonal, C is unitary, and both are 
of order n. Thus (UC)-iA(UC) = A, 

and the proof is complete since UC is unitary.f 

Exercise 10.3.6. (i) Show that all characteristic roots of a normal matrix 
are regular, (ii) Show that the characteristic vectors of a normal nxn 
matrix span the total vector space 


10.4. Triangular canonical forms 

10.4.1. We have so far confined our attention to diagonal 
canonical forms. In general, however, a matrix does not possess 
such a forln. On the other hand, we may, in every case, assert the 


t For an alternative proof see Perils, 6, 194. 



TRIANGULAR CANONICAL FORMS 


307 


X, § 10.4 

existence of a triangular canonical form. The initial steps towards 
the proof of this result were taken by Jacobi in a paper published 
posthumously in 1857. The theorem to be proved next is due, in 
its present form, to Schur (1909). 

Theorem 10.4.1 . Every matrix is unitarily similar to a triangular 
matrix. 

This statement is equally valid whether the term ‘triangular’ is 
taken to mean ‘upper triangular’ or ‘lower triangular’. We shall 
prove it in the first instance for the former interpretation and then 
deduce at once that it continues to hold for the latter. 

The argument is by induction with respect to n and is of the type 
with which we are now familiar from the proofs of Theorems 10.3.4, 
10.3.7, and 10.3.8. Let us assume that the assertion is true for n—l, 
where n'^2. Let be a characteristic root of A, a unit charac¬ 
teristic vector (generally complex) of^A associated with A^, U a 
unitary matrix with as its first column, and V = U“^AU. Then 
Vri = KKi f and so 



where y is a vector of order n—1 and A^ a square matrix of order 
n—1. By the induction hypothesis there exists a unitary matrix 
Cl and an upper triangular matrix Aj, both of order w — 1, such that 
AjCi ™ CiAi. Hence 



and therefore U“^AU.C = CA, where C is unitary and A upper 
triangular. The theorem now follows at once for upper triangular 
canonical forms. 

The case of lower triangular canonical forms is an easy corollary. 
For if A is any given matrix there exists, as we have just shown, a 
unitary matrix W and an upper triangular matrix T such that 
W-^A^W = T. Hence U~^AU == A, where U = W is unitary and 
A = is lower triangular. 

It is instructive to consider an alternative proof of Theorem 10.4.1 which 
uses only invariant notions. We assert, in the first place, that if Lisa linear 
transformation of 93n i'Vio itself, then there exists a basis {yi,...,yn} of 33n such 
that, for k = l,...,n, Lyj^ is a linear combination ofyi,,„,yj^. 



308 


CANONICAL FORMS 


X, § 10.4 

To prove this, let Ai be a characteristic root of the matrix representing L 
with respect to (£, and let be any characteristic vector associated with 

iyi=Aiyi. (10.4.1) 

Denote by U the space consisting of all scalar multiples of yj, and let U' be 
any complement of H. Then, given any y e there exists a unique scalar 
«(y) such that Ly-cc{y)y^ e U'. 

It is evident that 

oc(ty) = tMy). a{y+y') = a(y)+a(y'), 

and therefore the mapping M of U' into itself, defined by the formula 

My = Ly—ot(y)yi (y e U') (10.4.2) 

is linear. Since U' is a complement of U, we have d{XV) = w — 1, and so U' 
is isomorphic to We define a mapping M* of into itself by the 

requirement that, if y e H', rj e y ^y ^ Then 

M* is evidently linear. 

We now argue by induction with respect to n. The assertion is true 
trivially for n = 1. Assume that it is true for n > 2. Then S3n_i possesses 
a basis iQn) that, for k = 2,...,n, M*Ti^ is a linear combination 

of*l.. say 

(fc = 2.n). 

Denote by y 2 ,..., Yn the vectors in U' corresponding to r| 2 »..., rjn respectively. 
Then {y 2 »—>yn} ^ ^ basis of U' and 

= Afc 2 y 2 +...+Ajkjby*, (k = 2,...,n). (10.4.3) 

Now {Yu y 2 f•••fYn) is a basis of and in view of (10.4.1), (10.4.2), and 
(10.4.3) it satisfies the required condition. Thus our assertion is proved. 

We know by Schmidt’s orthogonalization process (Theorem 2.5.6, p. 67) 
that S3n possesses an orthonormal basis X = {Xi,...,x„} such that 

Xjfe = Cj^iYi-h•••-}-Cj^kYkf ^kk 7*^ ^ (k — l,...,n). 

In view of the result just proved it follows that, for k = l,...,n, Lkj^ is a 
linear combination of Xi,...,X;t. 

Let, now, A be a given matrix and let the linear transformation L be 
defined by the equation A = (£). If X is the (unitary) matrix having 

Xi,...,x,j as its columns, then, by Theorem 4.2.8 (p. 121),X”^AX = ^(L; X)- 
Writing X~^AX = we then have 

ZrXjfc = &i*;Xi+...+6„ji.x„ (k = l,...,n). 

But is a linear combination of Xx,...,X;^ only. Hence = 0 (^ > k), 
and so X^^AX is triangular. Theorem 10.4.1 is therefore proved once again. 

Even if A is real its characteristic roots may be complex, and it 
is therefore not generally possible to find a real unitary (i.e. an 
orthogonal) matrix U such that U-^AU is triangular. However, 
by adapting the argument used in proving Theorem 10.4.1, we 
obtain without difficulty the following companion result. 



TRIANGULAR CANONICAL FORMS 


309 


X, $ 10.4 

Theorem 10.4.2. A real matrix is orthogonally similar to a 
triangular matrix if and only if all its characteristic roots are real. 

Exebcise 10.4.1. Write out a proof of this result. 

It is interesting to observe that the main theorem on diagonal 
canonical forms previously established in § 10.3.3 follows once again 
from results on triangular canonical forms. To demonstrate this 
we require a simple preliminary result. 

Theorem 10.4.3. A triangular matrix is normal if and only if 
%t ^s d%agoncdm 

Assume, without loss of generality, that the given normal matrix 
A = {a^fj is of the upper triangular type. If A is non-diagonal, 
let r be the suffix of the first row in A which contains an element 
^ 0 such that r < s. Then == 0 for k ^ r and we have 

(A^A), = i M 

{AA®’)„ = f > \a„?+\a„\^ > \a„\^. 

A: = l 

Hence A^A ^ AA^, and our assertion is therefore proved. 

Theorem 10.3.8 can now be derived as follows. Let A be a 
normal matrix. By Theorem 10.4.1 there exists a unitary matrix 
U and a triangular matrix A such that U"^AU = A. Hence A is 
normal, and therefore diagonal by Theorem 10.4.3. Thus any 
normal matrix is unitarily similar to a diagonal matrix; and 
Theorem 10.3.8 follows. 

10.4.2. Schur, in 1909, used the triangular canonical form to 
derive a number of interesting inequalities. 

Theorem 10.4.4. // A = {a^f) is any complex nxn matrix and 
Ai,..., \ are its characteristic roots, then 


n n 


I lA,!** < I |a„l^ 

(10.4.4) 

r=l r,s=l 



(10.4.5) 


(10.4.6) 


Equality in any one of these relations implies equality in aU three and 
occurs if and only if A is normal. 



310 


CANONICAL FORMS 


X, § 10.4 

We note, in the first place, that if X is any matrix, then 

tr(x^x) = i = i i = i 

s=l a=lr=l r,8=l 

i.e. tr(X2X) = i \X,,\K (10.4.7) 

r,s=l 

Next, suppose that C, K, U are matrices such that 

C = UKU-i, U unitary. (10.4.8) 

Then G^’C = UK^KU-i and so tr(Ci’C) = tr(K^K). Hence, by 
(10.4.7), 

I = I |K„| 2 . (10.4.9) 

r,s^l /,s=l 

Now, by Theorem 10.4.1, there exists a unitary matrix U and 
an upper triangular matrix A = such that 

A = UAU-i. (10.4.10) 

Here evidently = \ (r=l,...,w). Since (10.4.8) implies 
(10.4.9), we have 

i = i \drs\^ = i i\i^+ 2 > i iA,p. 

r,s«l r,s = l r==l r<s r=*=l 

Again, by (10.4.10), 

i(A±S^) = u.ka±a^).u-j, 

and hence, by (10.4.9), 

Z, 2 Z, 2 Z<2”‘^Zy 2 

r,s—1 r,5==l r=l ry^s 

r=l 

The three inequalities are therefore established. 

Equality in (10.4.4) evidently occurs if and only if = 0 
{r < s)] it occurs in (10.4.5) if and only if dj,g+d^j. =z 0 (r s), and 
in (10.4.6) if and only if d^g—d^j, = 0 (r 7 ^= s). Now since dj.g — 0 
for r> 8, it follows that each of the above conditions is equivalent 
to the requirement that A should be diagonal. The proof is there¬ 
fore complete in view of Theorem 10 ..3. 8 . 

Schur’s inequalities imply a number of further results with most of which 
we are already familiar. 

(i) The inequalities of Theorem 10.4.4 obviously imply the weaker 
inequalities of Theorem 7.6.3 (p. 211). 



X,§10.4 TRIANGULAR CANONICAL FORMS 311 


(ii) Theorem 7.6.2 may be deduced from (10.4.6) as follows. If A = (a,,) 
is real, then 


ij3mA,r < 2 






(10.4.11) 


where 


a == max i\ars~a„\. 


Let A be any characteristic root of A. Since the non-real characteristic roots 
of a real matrix occur in conjugate pairs, it follows that every non-vanishing 
value of |3mA,.|* occurs at least twice on the left-hand side of (10.4.11). 

2|3inA|* < a*n(n-l). 

and Theorem 7.5.2 follows. 

(iii) Using the inequality of the arithmetic and geometric means and 
(10.4.4), we obtain 


If max |a„| = p, then 




IdetAp = |A.p...|A„p < jW!±^±M)" < [i 2 K, 
IdotAp < (^.nv)”. 


|det A| < 

An alternative derivation of this inequality will be given in § 13.5. 

Exercise 10.4.2. Show that the weaker inequality |detA| < n\p^ is 
trivial. 


(iv) Lot A —. (a„) bo a unitary matrix. Then, by (10,4.7), 

2 [a„|2 = tr(A^A) = tri — n, 

r,a=«l 

n 

and so, by (10.4.4), - ^ lA,.!* < 1. 

Tl ' f 

r-1 

Now the determinant of A has modulus 1, and therefore 

1 = |Ax...A„l = {|A,p... |A„p}V» < i{|Aip4-...+ |A„p} < 1. 

Hence {\X,\K. |A„p}V" = :J{|A,P+...+|A„p}. 

But the inequality of the arithmetic and geometric moans reduces to an 
equality if and only if the numbers involved are all equal. Hence 

lAx|.= ...= 1A„| = 1. 

Thus every characteristic root of a unitary matrix is of modulus 1, a result 
with which we are, of course, already familiar. Moreover, wo have 

i|A,P = n= i Kp, 

f“l r,«“l 

and thus there is equality in (10.4.4). Hence any unitary matrix is unitarily 
similar to a diagonal matrix. This result (which is a special case of Theorem 
10.3.8) was first discovered by Frobenius in 1883. 



312 CANONICAL FORMS X, § 10.5 

10.5. An intermediate canonical form 

As we know, not every matrix is similar to a diagonal matrix. 
At the same time the knowledge of triangular canonical forms does 
not provide us with an adequate foundation for the study of the 
deeper properties of matrices, and some more special canonical 
form is therefore needed. It is, in point of fact, possible to show that 
every matrix A is similar to an ‘almost diagonal’ matrix C. This 
means, to be rather more precise, that the diagonal elements of C 
are equal to the characteristic roots of A, the elements immediately 
above the diagonal are equal to 1 or 0, and all remaining elements 
are equal to 0. The matrix C is known as the classical canonical 
form of A, but the proof that every A is similar to a matrix C of the 
type just specified cannot be given here.f We propose, however, 
to derive a canonical form which is intermediate in type between 
the triangular canonical form (under the similarity group) and 
the classical canonical form. 

Throughout the discussion below, A denotes a given nxn 
matrix, of which are the distinct characteristic roots 

occurring with respective multiplicities Furthermore, 

Ux denotes the vector space of vectors x such that 

(AI-A)^x = 0. 

We require a number of preliminary results. 

Theorem 10.5.1. For every number A we have 

= mx{A). 

From the discussion on p. 279 we know that (AI—A)^ is a group 
matrix. Hence, by Theorem 9.4.2, J 

i2{(AI—A)^} = n’—mQ{(X\--hY} = n—mQ{\\—A) == n—mx(A), 
and therefore 

dm = n-i?{(AI-A)»} = mA(A). 

COBOLIiARY. = »• 

Theobem 10.5.2. Let x be a non-zero vector. Then there exists a 
unique non-constant monic polynomial ^ having minimum degree 
and such that <f>{A)x = 0. 

t Classical canonical forms and related questions constitute one of the central 
topics in the theory of matrices and are discussed fully in many textbooks. For 
references see the footnote on p. 290. 

X The appeal to the results of Chapter IX is not essential since it is not very 
difficult to show directly that, for every nxn matrix B, = n—Wo(B). 



X,§10.5 AN INTERMEDIATE CANONICAL FORM 313 

The n+ 1 vectors x, Ax,..., are linearly dependent and there 
exist, therefore, scalars Co,Ci,...,c^, not all 0, such that 

CoX+Ci Ax+'-^+CnA'^x = 0. 

In fact, since x 0, not all scalars Cj,..., are 0, and thus the non¬ 
constant pol 3 niomial f(t) = has the property 

that/(A)x = 0. Among all such polynomials there is at least one, 
say which is monic and has minimum degree. Then ^ is unique, 
for if 7 *^ ^ is another pol 3 momial having these properties, then 
{<^(A)—0i(A)}x = 0. But is of lower degree than and we 
therefore have a contradiction. 

Definition 10.5.1. (i)Lei x 0. Any n(m~cxm8iant 
f such that /(A)x = 0 is an annihilating polynomial of x. 
(ii) The unique polynomial <f> of Theorem 10.5.2 is the minimum 

ANNIHILATING POLYNOMIAL of X. 

Theorem 10.5.3. Let x Theft the minimum annihilating 

polynomial of x divides every annihilating polynomial of x. 

Let be the minimum annihilating polynomial of x, and / any 
annihilating polynomial of x. Then there exist polynomials g, h 
such that / = gr<^+A, where h either vanishes identically or else is 
of lower degree than We then have 

0 =/(A)x = {g(Kmh)+h(h)}x = A(A)x. 

Thus we cannot have h ^0, for this would be in contradiction to 
the minimal definition of Hence h = 0,f = g<f>, and the theorem 
is proved.t 

Theorem 10.5.4. Let 93i,..., be any bases of 
respectively. Then the n vectors contained in these bases constitute a 
basis of 93^. 

If A; = 1, then 11;^^ = 93^ by Theorem 10.5.1, and the theorem is 
therefore true. Next, assume that k > I, and write 

(i=l.*)• 

Let (Xij be scalars such that 

= 0 , 

t It will be recalled that precisely the same reasoning was used in the proof of 
Theorem 7.4.2 (p. 204). 



CANONICAL FORMS 


314 

where i ranges over the values 1 ,..., 
the values 1 ,...,%. Write 


X, § 10.5 

k and, for each i, j ranges over 


Then g VLxt {i = 1,—,^), and 

yi+...+y;fc = 0. (10.5.1) 

It is sufficient to show that Yi = ... == = 0, for this implies that 

all a’s vanish and the theorem then follows at once. 
Premultiplying (10.5.1) by (Agl—A)^...(A 4 .I—A)^, we obtain 

(A^I-A)-..(A^I~A)-yl+(AJ^A)^..(A,I~A)-y2+... = 0. 

The second term on the left is equal to 

(A31-A)-...(A;, A)-(A3 I~ A)-y3, 

and this vanishes since e U;^,. Similarly the third,..., ith terms 
also vanish, and therefore 


(A3l-A)-...(A;,I~A)^yi = 0. (10.5.2) 

Suppose that y^ ^ 0, and denote by the minimum annihilating 
polynomial of y^. Then, by (10.5.2) and Theorem 10.5.3, 

<^(<) divides (Ag—^)^...(A 4 .—^)^. (10.5.3) 

Again, y^ e i.e. (A^I—A)’*^yi = 0; and therefore 

(f>{t) divides (A^—0^. (10.5.4) 

Since, however, A^,..., A^ are distinct by hypothesis, there exists no 
non-constant polynomial ^ satisfying both (10.5.3) and (10.5.4). 
We thus arrive at a contradiction, and it follows that y^ = 0. By 
symmetry yg ==... = y^ = 0, and the proof of the theorem is 
therefore complete. 


In the statement and proof of the main theorem below we shall 
make use of the following notation. If Gj,..., G^j. are square matrices 
of order respectively, then we denote by 

dg(Ci,...,Cfc) 

the square matrix C, of order mi+.„+TO*, of which Ci,..., C* are 
Bubmatrices such that the diagonal of G consists of the diagonals 
of Cl,..., G* (in that order), while every element of C which does not 



X,§10.5 AN INTERMEDIATE CANONICAL FORM 
belong to any of Ci,..., is equal to 0. Thus, for example, if 


315 


D = 





then 


dg(D.E) = 


(d 


11 

^21 

0 

0 

0 


^12 

^22 

0 

0 

0 


0 

0 

0^ 

0 

0 

0 

^11 

^12 

^13 

hi 

^22 

^23 

hi 

h2 

^33> 


If are all 1x1 matrices, then dg(Ci,...,CJ is simply a 

diagonal matrix, and the notation in this case reduces to one that 
is already familiar. 

We may note that if, for i = 1 ,..., k, the order of is the same 
as that of CJ, then 

dg(c„...,c,).dg(c;,...,c',) == dg(CiC;,...,c,c;). 

Theorem 10.5.5. If the characteristic roots of thenxn matrix A 
are Ai,..., A^, with multiplicities mj^ respectively, then there exists 

a {non-singular) matrix S and upper triangular matrices 
of order m^,„,,m^ respectively, such that 


S-iAS - dg(Ai,..., AJ. 


In this statement ‘lower triangular’ can, of course, be substituted 
for ‘upper triangular’. 

Denote by L the linear transformation of 93into itself such that 
A = ^(L; (S). If X G 11;^^, then 

(A,I-A)^*(Lx) = (A^I-A)^Ax - A(A^I~A)^*x = AO = 0. 


Thus X G VLxi implies Lx g U;^,. 

By Theorem 10.5.4 we know that a basis ® of can be selected 
having its first m^ vectors of U;^^, the next mg vectors in and 
so on. In view of the result just proved and (4.2.13) it follows that 

' ^(L; ®) = dg(Ai,...,AJ, 


where A^,..., are of order mj,..., mj^ respectively. Thus, for some 


matrix P, 


P-iAP-dg(Ai,...,A^). 


We now complete the proof by applying Theorem 10.4.1 separately 



CANONICAL FORMS 


316 


X. i 10.8 


to the matrices There exist non-singular matrices 

Oi»—. 0*. of order respectively, such that 

Qr^AiQi = Ai (i = l,...,k), 

where A^.is an upper triangular matrix of order Writing 

0 = dg(Oi,..MOfc). S = P0. 

we obtain 

S-iAS = O-^P-^APO 

= dg(0i-S..., 0*').dg(Ai.AJ.dg(0i,..., Ofc) 


10.6. Simultaneous similarity transformations 

In the preceding sections we have been investigating canonical 
forms of individual matrices. It is, however, of interest to inquire 
whether several matrices can be reduced to their canonical forms 
by the same transformation. 

Definition 10.6.1. The 'matrices A, B, C,... are simultaneously 
SIMILAR to diagonal {triangular) matrices if there exists a matrix S 
svah that S”^AS, S~^BS, S“^CS,... are all diagonal (triangular). 
If, in addition, S is unitary, then A, B, G,... are said to be 
SIMULTANEOUSLY UNiTABiLY SIMILAR to diagonal (triangular) 
matrices. 

The problem is to determine under what conditions the given 
matrices are simultaneously (unitarily) similar to diagonal or 
triangular matrices. As we shall see below, commutativity in pairs 
of the given matrices is the essential condition in each case. To 
simplify the argument we shall restrict ourselves, in the first place, 
to the case of two matrices only. 

10.6.1. We require, to begin with, some preliminary results on 
matrices of a special class. 

Deftnition 10.6.2. A matrix of type (r^ .r*) is a matrix of 

order ri+...-fr* having the form dg(Ai,..., A*,), where A^,..., A*, are 
of order fj,..., rjj, respectively. 

The type of a matrix is not defined uniquely. Thus the matrix 



t When we speak of triangular matrices we must, in any one problem, restrict 
our attention either to upper triangular matrices or to lower triangular matrices. 



X,§10.6 SIMULTANEOUS SIMILARITY TRANSFORMATIONS 317 

may equally well be said to be of type (2,1) or of type (3). However, 
the question whether a given matrix is of a given type can always 
be decided unambiguously. 

Theorem 10.6.1. Let Aj,..., A*, he distinct numbers, and vorite 
rj+...+>■* = n. Then an nxn matrix commutes with 

D = dg(AiI,j,...,Ai,I„) 

if and only if it is of type {ri,...,r^). 

Let AD = DA and write A in the partitioned form 

/A<“) . . . A«»\ 

.. 

\A(*'1) . . . A(*=*')/ 


where A<^^> is an x r^ matrix. Then 


AD = 


DA = 


/AiA<“) . 

. . A*Aaw\ 

\AiA»w . 

. ’ . AfcAW/ 

/AiA(“) . 

. . AiA(i«\ 

\A,A(«) ! 

. . AfcAW/ 


and therefore A^A^^^ = A, A<^^> {i, j = 1,..., k). This implies that 
A{«) = O when i ^ j, and thus A is of type (r^,..., r^). Conversely, 
if A is of type (ri,...,rj(.), then it obviously commutes with D. 


Exercise 10.6.1. Deduce from Theorem 10.6.1 that, if co^,..., are 
distinct, then a matrix commutes with dg(cDi,..., a>„) if and only if it is 
diagonal. 


Theorem 10.6.2. If a matrix A of type {r^,..., r^) is similar to a 
diagonal matrix, then there exists a matrix S of type (r^,..., r*.) such 
that S~^AS is diagonal. 

Write ri+...+rjj, = n and A = dg(Ai,..., A;[,)> where A^. A/g 

are of order ri,...,r*, respectively. If X is any p xp matrix and at 
any number, put 


fJX) = R(wl^-X)+mJX)-p. 

By the rank-multiplicity theorem (p. 214) we have, for all X and 
^ /JX) > 0. (10.6.1) 

Moreover, by Theorem 10.2.3 (p. 294), 

/JX) = 0 (for all c) 







318 CANONICAL FORMS X, $ 10.6 

if and only if X is similar to a diagonal matrix. Now clearly 

/JA) = i /JA,), 

1 = 1 

and so 2 /„(Af) = 0 

1 = 1 

for all o). Hence, by (10.6.1), /^^(A^) = 0 for i == l,...,jfc and all 
a>. Each matrix is therefore similar to a diagonal matrix. 
For i = 1 ,..., A; let be a non-singular matrix and a diagonal 
matrix, both of order such that = D^. Writing 

S == dg(Si,...,S^), we obtain at once 

S-iAS=:dg(D„...,D,), 

and the theorem is therefore proved. 

Theorem 10.6.3. Two matrices are simultaneously similar to 
diagonal matrices if and only if they commute and each is similar 
to a diagonal matrix. 

Let A, B be given matrices. If there exists a matrix S such that 
S"^AS, S”^BS are both diagonal, then these two matrices commute 
and therefore A and B commute. 

Suppose, on the other hand, that AB = BA and that A and B 
are both similar to diagonal matrices. Let A^,..., be the distinct 
characteristic roots of A and let their multiplicities be ri,...,r^ 
respectively. There exists, then, a matrix P such that 

P-»AP=dg(AiI,.,...,A;,IJ. 

Now, in view of our hypothesis, P“^AP commutes with P-^BP 
and hence, by Theorem 10.6.1, P-^BP is of type (r^,..., rj. Since B 
is similar to a diagonal matrix, so is P“^BP; therefore, by Theorem 
10.6.2, there exists a matrix Q, of type (ri,...,r; 5 .), such that 
Q-ip-igpQ ig diagonal. Moreover, again by Theorem 10.6.1, 0 
commutes with P“^AP, and therefore 

O-ip-iAPO = dg(A,I,..A,IJ. 

Thus (PO)~^A(PO) and (PO)~^B(PO) are both diagonal, and the 
theorem is proved. 

Exercise 10.6.2. Write out the proof of Theorem 10.6.3 for the case 
when the characteristic roots of one of the two given matrices are distinct. 

It is worth mentioning that Theorem 10.6.3 can be appreciably 
extended. It can, in fact, be shown that the matrices Ai,...,A,» 



X, § 10.6 SIMULTANEOUS SIMILARITY TRANSFORMATIONS 319 

are simultaneously similar to diagonal matrices if and only if they 
commute in pairs and each is similar to a diagonal matrix.f 

10.6.2. To deal with the case of unitary similarity transforma¬ 
tions, we shall need the following result. 

Theorem 10.6.4. Two commuting matrices possess a commcm 
characteristic vector. 

Suppose that AB = BA. Let A be a characteristic root of A and 
{Xi,...,x^} a basis of the vector space U of vectors x satisfying the 
equation Ax = Ax. Since Ax^. = Ax^ (i = 1,...,^:) we obtain 

A(BXf) = B(AXf) = A(BXi) (i = 1,...,*). 

Thus Bx^ G U (i = l,...,i;) and there exist, therefore, scalars 
(i,j — l,...,i) such that 

k 

If/xis any characteristic root of the A: xlfc matrix and (<i,..., 

is any characteristic vector of (a^j) associated with jll, then 

b( f tjXj] = i i (XijXi = i x^ i oc^jtj = pX 

'i=i i—1 ^=1 ^=1 

k 

Thus X = 2 t^x^ is a common characteristic vector of A and B. 
1—1 

Theorem 10.6.5. // two matrices commute, then they are simul¬ 
taneously unitarily similar to triangular matrices. 

If AB = BA, then, by Theorem 10.6.4, there exists a unit vector 
X and scalars a, jS such that Ax = ax, Bx == j3x. Let U be any 
unitary matrix having x as its first column. Then 

^-^^=6 xj - ij - 

where A^, B^ are square matrices of order n—1 and p, q row vectors 
of order ri—1. Hence 

U-iABU=(“f U-iBAU=(°f 

\0 AjBj / \0 B^Aj / 

Since AB = BA it follows that A^ B^ = B^ A^; and we now argue 
by induction with respect to n. Suppose that any two commuting 

t For a proof of this theorem and for many related results on simultaneous 
similarity transformations see M. P. Drazin, J. W. Dungey, and K. W. Gruenberg, 
J. London Math. Soc. 26 (1961), 221-8. 



320 


CANONICAL FORMS 


X,:i 10.6 


matrices of order n—l are simultaneously unitarily similar to 
triangular matrices. Then there exists a unitary matrix of 
order n—l, such that Vf and Vf are both triangular. 
Writing 

V = 



we see that UV is unitary, and (UV)“^A(UV), (UV)“^B(UV) are 
both triangular. The proof is therefore complete. 


Exercise 10.6.3. Deduce Theorem 10.3.8 from Theorem 10.6.5 without 
making use of Theorem 10.4.3. 

Exercise 10.6.4. Show that the converse of Theorem 10.6.5 is false. 


10.6.3. We shall next demonstrate that the restriction of 
Theorem 10.6.5 to the case of two matrices is not essential. We 
begin with a few obvious remarks. 

When we say that the matrices A^,..., A;^ are linearly independent 
we mean, of course, that the relation 

implies = ... = <x;j. = 0. A set of X matrices contains at most 
linearly independent ones. If k is the maximum number of 
linearly independent matrices in the set and if A^,..., A^. are linearly 
independent matrices of the set, then every matrix of the set can 
be expressed as a linear combination of A^,..., A^.. 

Theorem 10.6.6. Let Q he a {finite or infinite) set of matrices 
which commute in pairs. Then all matrices in S possess a common 
characteristic vector. 


Let k be the maximum number of linearly independent matrices 
in S and let A^,..., A^j. g S be linearly independent. It is then 
clearly sufficient to show that A^,..., A^ possess a common charac¬ 
teristic vector. 

Let I <k and assume that A^,..., A,, possess a common 
characteristic vector, say x^. Then there exist numbers 
.«chth.t (.'=1.r). 


Let 3 denote the set of all vectors y satisfying the r conditions 
A<y = a<y {i=l . r). 

Then 3 is not empty since x, e 3- Let a be the maximum number 



X, § 10.6 SIMULTANEOUS SIMILARITY TRANSFORMATIONS 321 

of linearly independent vectors in 3, and let yi,.-. y* e 3 be linearly 
independent. Then 

KVi = “iYi == (10.6.2) 

and we have, for 1 < i r, 1 < j < s, 

= A,+i(Afy^) = 

Hence, for^’ = A,.+iy^ e 3. and so there exist numbers 

such that ^ 

K+iYj = 2 U = (10.6.3) 

There exist a number jx and numbers not all zero, such thatf 

= (i = !,...,«). (10.6.4) 

Write 

Since are not all zero and Yi,..., are linearly independent, it 

follows that ! ^ 0. Moreover, for 1 < i < r we have by (10.6.2), 

j=l 

Again, using (10.6.3) and (10.6.4), we obtain 

A,+iX,+i = 2 A,+iy,. = 2 i CfjYi == 2 y^ 2 

j — 1 i = l i — l i~l j — 1 

s 

i-1 

We see, then, that the matrices A^,..., A,., possess a common 
characteristic vector Hence, by induction with respect to r, 
it follows that Ai,...,A;^ possess a common characteristic vector. 
The theorem is therefore established. J 

We are now able to prove an important generalization, due to 
Frobenius, of Theorem 10.6.5. 

Theorem 10.6.7. Let Q be a set of matrices which commute in 
pairs. Then the matrices of S are simultaneously unitarily similar to 
triangular matrices. 

By Theorem 10.6.6, all matrices in S possess a common charac¬ 
teristic vector, say x, which may be assumed to be normalized. If 

t In fact, fi is a, characteristic root and tg)'^ an a.ssociated characteristic 

vector of the ax s matrix {qj). 

X I owe this proof to Professor R. Bado. 

6582 


Y 



322 CANONICAL FORMS X, § 10.6 

U is a unitary matrix with x as its first column, then, for every 
A e S, we have 


U-iAU = 



Here the scalar a, the vector y (of order 7^—1), and the square 
matrixAj (of order w—1) depend on A. Let be the set of matrices 
Ai corresponding to A e S. Since the matrices of S commute in 
pairs, so do the matrices of The proof of the theorem can now 
be completed in the usual way by induction with respect to n,’\ 


Theorem 10.6.8. The matrices of a set S of normal matrices are 
simultaneously unitarily similar to diagonal matrices if and only if 
they commute in pairs. 

If the matrices of a set S of normal matrices are simultaneous!^ 
unitarily similar to diagonal matrices, then they obviously commutp 
in pairs. If, on the other hand, they commute in pairs, then bjy 
Theorem 10.6.7 they are simultaneously unitarily similar to 
triangular matrices. Each of these triangular matrices is again 
normal and so, by Theorem 10.4.3 (p. 309), diagonal. Hence/the 
matrices of ® are simultaneously unitarily similar to diagonal 
matrices. - 


PROBLEMS ON CHAPTER X 


1. Let A be any given matrix. Show that there exist unique matrices P, 

Q such that: (i) A = P-f Q, (ii) P ishermitian and O skew-hermitian. S^ow 
further that A is normal if and only if PQ = OP* \ 

2. Show that a matrix A is normal if and only if | Ax| = | A^x| foi^ all 

vectors x. i 

3. Let A be a normal matrix. Show that (i) if all characteristic roots jof 

A are real, then A is hermitian; (ii) if all characteristic roots of A are jof 
modulus 1, then A is unitary. ‘ 

4. Suppose that A is similar to a diagonal matrix and that a positive power 
of A is equal to O. Show that A = O. 

6. Show that an orthogonal matrix all of whose characteristic roots are 
real is necessarily symmetric. 


6. Find a similarity transformation which reduces the real matrix 



at 


to diagonal form. 

7. If A* = I, show that tr A is an integer. Show also that, if A is not a 
scalar matrix, then |tr A| < n. 


t Cf. the proof of Theorem 10.6.5. 



X 


PROBLEMS ON CHAPTER X 


323 


8. Express the matrix 



in the form XKX^, where X is orthogonal and K diagonal. 

9. The following problem was considered in Chapter VII. ‘ If all elements 
of are bounded as m oo, show that all characteristic roots of A are, in 
modulus, less than or equal to 1.’ Prove this result by the use of triangular 
canonical forms. 

10. Let Ai,...,An be the characteristic roots of A and /Xi,...,/i„ the (neces¬ 
sarily real) characteristic roots of A^A. Show that 

— +l^nt^ < — 

with equality if and only if A is normal. 

11. Prove the result of Problem VI, 15 by the methods of the present 
chapter. 

12. Show that every real symmetric 3x3 matrix A whose characteristic 
roots are 1, — 1, — 1 can be written in the form A = 211 ^— 13 , where 1 is a 
suitable unit vector. 

13. Find orthogonal reductions for each of the matrices 


(i) 


0 1 0 \ 

1 0 0 ; 
0 0 2 / 


/O 1 0\ 

(ii) 1 -3/2 0 ; 

\0 0 1 / 


14. Find all solutions of the equation 


/ ^ 

3V2 

3V2\ 

(iii) 13V2 

7 

-6 

\3V2 

-6 

7 / 


X2 = 




16. Find a matrix X such that 2X*—6X = A, where 



16. Suppose that the characteristic roots of A are distinct and that B has 
the same characteristic roots as A. Show that there exist matrices Q and R, 
at least one of them non-singular, such that A == OR» B = B.Q. 

17. Let A = (a„) be an arbitrary nxn complex matrix and write 

p = max |. Show that |det A | == if and only if the following three 

conditions are satisfied, (i) A is normal; (ii) |a,.,| = p (1 < < n); (iii) all 

characteristic roots of A are equal in modulus. 

18. Show that A is normal if and only if the characteristic roots of A^A 
are equal to the squares of the moduli of the characteristic roots of A. 

19. Show that an w x n matrix is normal if and only if it possesses an 
orthonormal set of n characteristic vectors. 



324 


CANONICAL FOKMS 


X 


20. Show that, if /(A,B) = tr(A^B), then 

|/(A,B)|2</(A,A)./(B,B). 

Deduce that A^(A+B) < iV(A)+iV(B), 

where N ( A ) — {/(A, A)}*. 

21. Deduce from the inequality (10.4.4), p. 309, that all characteristic 
roots of a hermitian matrix are real. 

22. The matrix A has distinct characteristic roots If <I>W is 

any function, let ^(A) bo defined by the equation ^(A) = / (A), whore/(O 
is a polynomial such that — ^(co^) (v = l,...,n). Show that <^(A) 

depends on A and <^(i) only, but not on the choice of f(t). 

23. Use the result of Problem VII, 39 to show that, if A satisfies the equa¬ 
tion A* = I, then it is similar to a diagonal matrix. 

24. Show that a matrix which is similar to a diagonal matrix possesses a 
critical principal minor. Deduce that the rank of a skow-hermitian matrix 
is even. 

25. Let A be a unitary matrix, B a normal matrix, and denote by p, j8* 
the characteristic roots of B of least and greatest modulus respectively. 
By using the result of Problem VIII, 19 show that if oj is any number which 
satisfies lA—toBl = 0, then |j3| < |co|“^ < |j8*|. 

. (a b\ 

26. Find necessary and sufficient conditions for the matrix to 

be similar to a diagonal matrix. 

27. Explain how to obtain all n x n matrices X which satisfy the equation 
/(X) = O, where / is a polynomial with distinct zeros. 

Solve the equations 

(i) X2 = X; (ii) X2 + 4X+3I-0. 


28. A and B are matrices such that (i) the characteristic roots of A are 
distinct; (ii) every characteristic vector of A is also a characteristic vector 
of B. Show that AB = BA. Show also that if AB = BA and (i) is given, 
then (ii) is valid. 

29. Show that a 2 x 2 matrix which is not similar to any diagonal matrix 
is similar to some matrix of the form 

Let S be the set of all 2 x 2 matrices A having the property that any two 
matrices which commute with A commute with each other. If A g 0 and 
P is non-singular, show that P"^AP e 0; and deduce that a matrix belongs 
to 0 if and only if it is non-scalar. 

30. Show that A is similar to a diagonal matrix if and only if, for every 



number ca, 


(a>I”-A)2x = 0 implies (wl—A)x = 0. 


31. If the w X w matrix A has distinct characteristic roots and AB = BA, 
show that B — / (A), where / is a polynomial of degree < n— 1. 

32. Show that is equal to a polynomial in A if and only if A is normal. 

33. Show, by using either the preceding question or Problem VII, 25 
that AB = BA implies A^B BA^ for every B if and only if A is normal. 



325 


X PROBLEMS ON CHAPTER X 


34. Suppose that the matrix A satisfies the equation 


where toj,, 


.,co 4 ; arc distinct numbers. If, for any x e 

yi = TT (--■^) .X a =■ 

K = \ ^ 

K^i 


show that Ay I = ojiYi (i ^ 1,...,^), and that x — yi + -*.+yfc* Using 
Problem IT, 22, deduce that A possesses n linearly independent characteristic 
vectors. Hence show that a matrix, whose minimum polynomial is the pro¬ 
duct of distinct linear factors, is similar to a diagonal matrix. 

35. Let 0 ) 1 ,..., be the distinct values of the characteristic roots of A, 
andlet, for 1 < i < A;, denote the space of vectors x such that Ax = aj^x. 
Show that, if the minimum polynomial of A is the product of distinct linear 
factors, then d(Ui) +...4-rf(U;fc) = n. 

36.. Let A bo a given matrix and let Ui,..., U/t be defined as in the preceding 
question. Using Probh^m VIT, 40, show that, if d(Ui) + ... + d(Ufc) = n, 
then A is similar to a diagonal matrix. 

Deduce Theorem 10.2.3 (p. 294) from this result, and also establish the 
converse inference. 

37. A and B are commuting 2x2 matrices. Provo that there exists a 
matrix G and polynomials/, g such that A = /(C), B = (/(C). Show also 
that this assertion becomes false for matrices of order greater than 2. 

38. Show that, if A and B are 2x2 matrices with the same characteristic 
vectors, then AB = BA. Show also that this inference is false for matrices 
of order greater than 2. 

39. Let S be a set of n X n matrices such that, if A, B G 0, then A^ ~ A, 
AB = O. Prove, by induction with respect to the number of matrices in 0, 
that the matrices in 0 are simultaneously similar to diagonal matrices. 

Deduce that, if Aj,..., Ajj, g 0, then 


/?(Ai-f-*-- + Aji.) = i?(Ai)-}-...-f-R(Ajfc), 
and hence show that 0 contains at most n non-zero matrices. 

40. Let A be a matrix which is similar to a diagonal matrix, and put 



v^i 


where cxi,..., are the distinct values of the characteristic roots of A. Show 
that 

(i) = O (i,/ = 1,...,A;; i ^j); 

(ii) Ei4-... + E;i. = I; 

(hi) Ef = E^ (i - 1,...,A;); 

(iv) A — aiEi-|-*-- + afcE^. 

41. Let 0 be a sot of matrices which commute in pairs and are similar to 
diagonal matrices. By considering the E-matrices (defined in the preceding 
question) associated with every matrix of 0, show that the matrices of 0 
are simultaneously similar to diagonal matrices. 



326 


CANONICAL FORMS 


X 


42. Show that the matrices of a set 6 are simultaneously similar to 
triangular matrices if and only if there exist linearly independent vectors 

such that, for any AeS and any k in the range 1 < A; < n, 
Axjfe is a linear combination of Xi,...,Xjfc. 

Deduce that, if the matrices of 0 are simultaneously similar to triangular 
matrices, then they are simultaneously unitarily similar to triangular 
matrices. 

43. Show that an orthogonal matrix all of whose characteristic roots are 
real is orthogonally similar to a diagonal matrix. 

44. Let A be an orthogonal matrix, a non-real characteristic root of 
A, and x a vector such that Ax = e-*“x, X^x = 2. Verify that x^x = 0, 

and prove that the vectors 5 — J(x4-X), rj = i(x—x) satisfy the relations 

Zi/ 


V'ri = 0 , §=^5 _ ^ 

A? = 5cosa-f-TQsincx, At) = — ^sinot+iQCosa. 

Hence show that if S is any orthogonal matrix with ^,7) as its first two 
columns, then 


S'lAS = 


/cos a — sin(x 0 
sin a cos a 0 

0 0 


where A^ is a square matrix of order n—2. Deduce that every orthogonal 
matrix is orthogonally similar to a matrix of the form 

dg(Ci,...,C^,D), 


where 


^sina^ 


—sina^ 

COSOCy 


{v = Uy real) 


and D is a diagonal matrix all of whose diagonal elements are ±1* 
46, Show that, if A, B, and AB are normal, then so is BA. 



XI 


MATRIX ANALYSIS 

In our discussion so far we have confined ourselves to purely 
algebraic methods—that is, methods based on the four rational 
operations—and have avoided reference to limiting processes.f 
It is interesting, however, to remove this restriction in order to 
see how an analysis of matrices might be initiated. In the first three 
sections of the present chapter we propose to develop the more 
elementary parts of such a theory, but we shall be able to do no 
more than touch the fringe of an extensive subject. J In the last 
section we shall approach the question from the opposite point of 
view and use the theory of matrices to solve a problem in classical 
analysis. The distinctive common feature of both lines of inquiry 
is, of course, the fusion of algebraic and analytic ideas. The results 
obtained in this chapter will not be needed in Part III. 

11.1. Convergent matrix sequences 

11.1.1. An analytic theory of matrices must naturally be 
founded on the notion of convergence or some equivalent notion. 

Definition 11.1.1. The sequence of nxn matrices con¬ 
verges (or TENDS to) a (in symbols : A^ \ as m-^oo or 
lim A^ == A) if 

m—^oo 

(^m)rs ->A^^asm-^co (r, s = 1 ,..., n). 

A sequence which does not converge is said to diverge. 

In other words, {A„^} converges to A if each element of 
converges, as m —> oo, to the corresponding element of A. 

In many ways the behaviour of sequences of matrices resembles 
that of sequences of numbers. Thus, if A^^^ A, -> B (m -> oo), 

aA„+i3B^->aA+i3B 
and A„B^-^AB. 

t The only notable deviation from this course consisted in the appeal to the 
fundamental theorem of algebra, which was needed to establish the existence of 
characteristic roots of every matrix. 

t For a more adequate treatment see Ferrar, 15, chap, v; Frazer, Duncan, and 
Collar, 16, chap, ii; and Wedderburn, 28, chap. viii. 



MATRIX ANALYSIS 


328 


XI, § ll.I 


The last statement implies, in particular, that if P, 0 are any 
matrices, then q p\Q 


Exercise 11.1.1. Provo these results. 


Definition 11.1.2. Let A=(a^g) be a complex matrix and 
B = (6„) a non-negative matrix (i.e, a matrix whose elements are 
real non-negative numbers.) Then B is said to majorize A (in 
symbols : A B) i/ 

{r,s = 

Exercise 11.1.2. Let A bo a complex matrix, B a non-negative matrix, 
and m a positive integer. Show that is non-negative and that, if 
A < B, then A’" < B^. 

Exercise 11.1.3. Let {A,^} be a sequence of complex matrices, {B,^} a 
sequence of non-negative matrices, and suppose that A^ B,„ and O. 
Show that -> O. 

Definition 11.1.3. For any matrix A = (a^^ we write 
IIAll = max |a„|. 

Exercise 11.1.4. Let A be a complex matrix and B a non-negative 
matrix, and suppose that A B. Show that 11A| | < 1 11* 

Exercise 11.1.5. Show that ||ABG|| < n* ||A1| . 1|B1| . ||C1|. 

Exercise 11.1.6. Show that A^ -> O if and only if 11 A^| | 0. 

The most natural sequences are those formed by powers of a 
single matrix, and some conditions governing the convergence of 
such sequences are given in the next theorem and in Exercise 11.1.8. 

Theorem 11.1.1. A necessary and sufficient condition for the 
relation ->0 to hold is that the moduli of all the characteristic roots 
of A should be less than 1. 

Let Ai,...,A,j denote the characteristic roots of A. By Theorem 
10.4.1 (p. 307), there exists a non-singular matrix U and an upper 
triangular matrix A (with diagonal elements Ai,...,A^) such that 
U-^AU = A. 

To prove the necessity of the stated condition we note that 
U-iA^U = A^. 

Hence A*^ O implies A^ O, and since the diagonal elements 
of A*^ are Ay^,...,A^, it follows that 

lAii < 1. 


..., 


|A„| < 1. 


( 11 . 1 . 1 ) 



XI, §11.1 CONVERGENT MATRIX SEQUENCES 329 

Next, we establish the sufficiency of (11.1.1). We can choose 
distinct real numbers o}n a real number q such that 

\W < coi < g < 1 {i = 1,...,*). 

Let T be any upper triangular non-negative matrix majorizing A 
and having wi,..., as its diagonal elements. Since the characteris¬ 
tic roots of T are distinct, there exists a non-singular 

matrix S such that 

S-^TS = dg(aii,...,6t>^) = A, say. 

Now U~^AU T and so, by Exercise 11.1.2, 

U-iA^U = SA^S-i. 

Hence, by Exercises 11.1.4 and 11.1.5, 

||U-iA»"U|| < llSA^S-i|| < ri2||S||.||A»”||.||S-i|| 

<n2|lSll.llS-il|.g"^->0. 

Consequently, by Exercise 11.1.6, U“^A"^U -> O, and so A”^ O. 

Exercise 11.1.7. Interpret Theorem 11.1.1 for the case n — 1. 

Exercise 11.1.8. Show that if the modulus of at least one characteristic 
root of A exceeds 1, then the sequence {A^^} diverges. 

11.1.2. As an application of the result just proved we shall 
establish a theorem due to Frobenius (1908) on the location of 
characteristic roots of matrices. If the characteristic roots of A 
are Ai,...,A^, we shall write 

p{A) = max |A,.|. 

Theorem 11.1.2. If A is a complex matrix, B a non-negative 
matrix, and A B, then p(A) ^ P(B). 

Let A be any characteristic root of A, and let /xq be a charac¬ 
teristic root of B having maximum modulus. Let e > 0, and write 

A'=—i—A, B' = — i —B. 

All characteristic roots of B' are less than 1 in modulus and so, by 
the sufficiency part of Theorem 11.1.1,B'^^-^O. But A' B' and 
so, by Exercise 11.1.2, A''^<^B'”^. Hence, by Exercise 11.1.3, 
A'^-^O and so, by the necessity part of Theorem 11.1.1, all 
characteristic roots of A' are less than 1 in modulus. Thus 
l^l/(l^ol+«) < 1. and since e can be chosen arbitrarily small, 
this implies that |A| < |/xq|, which is equivalent to the assertion. 



380 


MATRIX ANALYSIS 


XI, §11.1 

Corollary. // A = (a^,), A' = (|a„|), (hen p{A) < p(A'). 

Theorem 11.1.3. If B is any rum-negative matrix obtained from 
a non-negative matrix A when the latter is 'bordered' by rows and 
columns^ then />(A) < P(B). 

Let A be of order n and B of order N {> n). Write 


A' = 




The characteristic roots of A' are the characteristic roots of A 
together with N—n zeros. Hence p(A) = p(A'). Moreover, by 
Theorem 11.1.2, p(A') < p(B), and the assertion therefore follows. 


11.2. Power series and matrix functions 

In the analysis of functions of a scalar variable power series play 
a prominent part. It is therefore natural that in seeking to develop 
an analysis of matrices we should again make extensive use of this 
notion. Most theorems derived below are concerned with power 
series, but some of the results they contain are necessarily frag¬ 
mentary since a satisfactory discussion of the problems involved 
can only be based on the classical canonical form. Here, on the 
other hand, we must content ourselves with making what use we 
can of triangular and diagonal canonical forms. 

11.2.1. In ordinary analysis convergence of infinite series is 
defined in terms of the behaviour of their partial sums. The exten¬ 
sion of this idea to matrices is immediate. 

Definition 11.2.1. The series of matrices 

i A„ (11.2.1) 

m-o 

is said to ooNVEatOB to, or to have the sum, S if the seqtience of partial 
ewm If 

Sat = 2 Aot 

m*0 

converges to S as N -*■ co. A series which does not converge is said to 
DIV3B!BGB. 

00 

The statement 2 — S thus means that 

m«o 

w-O 


( 11 . 2 . 2 ) 



881 


XI, § 11.2 POWER SERIES AND MATRIX FUNCTIONS 

00 

Exebcise 11.2.1. Show that if 2 ^ is a convergent matrix series, then 

m—0 

00 

2 P0 is also convergent, and 

I PA„Q = P( I A„)0. 

m=0 t 

Definition 11.2.2. The aeries (11.2.1) absolutely conver¬ 
gent if each of the series on the left-hand side of (11.2.2) is absolutely 
convergent. 

It follows at once from corresponding results in scalar analysis 
that, if the series (11.2.1) is absolutely convergent, then it is con¬ 
vergent and its terms may be rearranged in any manner without 
alteration of the sum. 


Theorem 11.2.1. The series 2 ^ absolutely convergent if 


m=0 


and only if 2 llAmll convergent. 


m=0 


Suppose that 2 absolutely convergent. Then there exists 


m=0 


a number independent of Ny r, s such that 


(N 0; ryS l,...,n). 


Hence 


I |(Ajj<^: 

m«»0 

I IIAJI < f i l(A„)„| < n^K, 

m^O m—0 r,«=l 


and 2 I|AJ1 is therefore convergent. 


m-'O 


On the other hand, if 2 convergent then, since 

m—Q 

l(AJr5l < l|AJ (r,5= l,...,n), 
it follows that each of the series on the left-hand side of (11.2.2) is 
absolutely convergent. The theorem is therefore proved. 

00 

Thbobbm 11.2.2. If 2 A„ abadvlely convergent, then so is 

m»»0 

fPA^O. 

m *«0 

By Exercise 11.1,6 we have 

llPA^Oll < n*llPll.llAJl.11011 < K\\\J> 

where K is independent of m. Now, by Theorem 11.2.1, 2 llAmll 

m —0 



MATRIX ANALYSIS 


SS2 


XI. § 11.2 


is convergent. Hence 2 l|PA..,0|| is convergent; and therefore, 

00 

again by Theorem 11.2.1, 2 absolutely convergent. 

m=0 

11.2.2. After the preliminary remarks of § 11.2.1. we turn to 
the discussion of power series and begin with an easy result on 
infinite geometric progressions. 

Theorem 11.2.3. If the moduli of all characteristic roots of A 
are less than 1, then 1—Ais non-singular, and the series 

I+A+A2+A3+... 

converges to (I—A)“^. 

The first assertion is obvious. Writing 

= I+A+...+A-^, 
we obtain S^(I-—A) = I—A^+^. 

Hence, by Theorem 11.1.1, S^(I—A)-^I, and the required con¬ 
clusion follows. 


More generally, it is possible to obtain a striking relation between 

00 

the matrix power series and the corresponding scalar 

m—0 

00 

power series 2 

m=0 

Theorem 11.2.4. (i) If all characteristic roots of A lie in the 
interior of the circle of convergence of the p<ywer series 


00 


^(2) = 2 

(11.2.3) 

m«0 

then the matrix power series 

i CmA™ 

(11.2.4) 


m=0 


converges absolutely, (ii) If at least one characteristic root of A lies 
outside the circle of convergence of (11.2.3), then (11.2.4) diverges. 

This theorem is due to Weyr (1887). A more complete result was 
foimd in 1926 by Hensel who dealt fully with the critical cases 
when some of the characteristic roots of A lie on the circle of 
convergence. Hensel showed, in fact, that the matrix power series 
(11.2.4) converges if and only if all characteristic roots of A lie 
within or on the circle of convergence of (11.2.3) and satisfy the 



XI, s 11.2 POWER SERIES AND MATRIX FUNCTIONS 333 

further condition that, for every Mold characteristic root A on the 
circle of convergence, the power series for is convergent. 

To establish Theorem 11.2.4 we argue in much the same way as 
in the proof of Theorem 11.1. U Let R be the radius of convergence 
ofthe power series (11.2.3) and denote by Ai,...,A„ the characteristic 
roots of A. Let U be anon-singular matrix and A an upper triangular 
matrix, with diagonal elements Ai,...,A„, such that U'^AU — A. 

N 

Then the diagonal elements of ^ 

m=0 

(i=l,...,n); 

m=0 

and if, for at least one i, |A^.| > R, then the series 

i CmA”* (11.2.5) 

m=0 

diverges; in that case (11.2.4) also diverges in view of Exercise 

11 . 2 . 1 . 

Suppose, on the other hand, that lA^j < R,..., |A„| < R; and let 
... be distinct real numbers and q a real number such that 

|Aj| < R {i — 1.n). 

Let T be any upper triangular matrix majorizing A and having 
as its diagonal elements, and let S be a non-singular 
matrix such that 

S-^TS = dg(wi,...,a)„) = A, say. 

Then A™ << SA”*S-i, and so 

< lc,„|SA'”S-i (m > 0). 

Hence, by Exercises 11.1.4 and 11.1.6, 

< II |Cm|SA»*S-i|| < |c,„|7i*|1S|1.1|A>»|1.11S-1|| < Klcjq’-, 

where K is independent of to. Since 0 < g < i?, it follows that 

00 

2 ||c„A™|| converges. Hence, by Theorem 11.2.1, the series 
m-o 

(11.2.5) converges absolutely; and therefore, by Theorem 11.2.2, 
so does the series (11.2.4). 

It is worth pointing out that for two special cases the proof of 
Theorem 11.2.4 can be simplified still further. In the first instance 



334 


MATRIX ANALYSIS 


XI, (11.2 


assume that A is similar to a diagonal matrix, say 
S-iAS = 

where Ai,..., A„ are, of course, the characteristic roots of A. Writing 

= f 

m—0 

we obtain ^jv(A) = S, dg{^^(Ai),..,, • 8"^ 

Now if some A< lies outside the circle of convergence of (11.2.3), then 
the sequence ^iv(^) converge, and so (11.2.4) diverges. If, 

on the other hand, all lie within the circle of convergence of 
(11.2.3), then 

as N 00 (» = l,...,n) 

and therefore the series (11.2.4) converges (absolutely) to 

S.dg{^(Ai),...,^(AJ}.S-i. (11.2.6) 

In this case, then, we not only recognize the fact of convergence 
but also obtain an explicit formula for the sum of the matrix power 
series (11.2.4). 

We state the second special case in the form of a corollary. 

CoBOLLABY. If the power series (11.2.3) converges in the whole 
complex plane, then the matrix power series (11.2.4) converges 
absolviely for every matrix A. 

An independent proof of this result can be given very easily. 
We can show at once by induction with respect to m that, for w > 0, 

||A*»|| < (nllAll)"*. 

Hence f llc„A"*ll < | Ic^lWlAH)” < K, 

m—0 m—0 

where K is independent of N. The required conclusion therefore 
follows at once by Theorem 11.2.1. 

11.2.3. Theorem 11.2.4 suggests a method of defining new 
classes of functions of a matrix variable. Before we can adopt this 
method, however, we need to consider briefiy the case of power 
series whose sums are rational functions. 



XI, § 11.2 POWER SERIES AND MATRIX FUNCTIONS 336 

Theorem 11.2.6. Let <f>(z) be a rational function and suppose 

that, for \z\ < B, „ 

2 Cm*™ = ^(2)- 

m*»0 

If A is any matrix whose characteristic roots arCy in modulus^ less 

00 

than By then <f){A) exists and is equal to the sum of the series T A*". 

m=»0 

It is essential to emphasize that ^(A) is not defined as the sum 

00 

of the power series 2 it is defined by Definition 3.7.2 

m=0 

(p. 99), and the fact that it is equal to the sum of the power series 
is precisely what Theorem 11.2.5 asserts. 

Write ^(z) =f(z)jg{z), where/(z), g{z) are polynomials and, by 
hypothesis, g{z) ^ 0 for Jz| < B. If Ai,...,A„ are the characteristic 
roots of A, then, by Theorem 7.3.1 (p. 201), 

l?(A)| = 9 (Ai) ... 9(A„) 0. 

Hence ^(A) = {gr(A)}~V(A) exists. 

Put g{z) = i p„,z™ f(z) =-2 

m=0 m«0 

We are given that, for |z | < B, 

9'(2) S Cm 2”* =/(2), 

?n»=0 

and this implies that 

PoCm+PlC„,-l+...+PfcC,„_fc - I ^ 

where = 0 for r < 0. Hence 
fi'(A) f c„,A’" = (poI+i>iA+...4-p*A*') f c^A”* 

m«0 m=*0 

= 2 PoCmA’"+ 2 PiC„,A’”+l+...+ f PfcC„A'»+*’ 

m«»0 m=0 m»0 

= 2 PoCmA”*4- 2 2>lCm-lA"*+...+ 2 PfcCm-fcA"* 
m—0 m«0 m—0 

00 

m=0 

= 2 ?m A”* =/(A), 

m—0 

I c„, A»* = {flr(A)}-V(A) = ^(A). 

W —0 


and therefore 



836 MATRIX ANALYSIS XI, §11.2 

Exebcisb 11.2.2. Interpret Theorem 11.2.3 in the light of Theorems 11.2.4 
and 11.2.5. 

We are now able to extend our definition of functions of matrices. 

Definition 11.2.3. If all characteristic roots of A lie in the 
interior of the circle of convergence of the power series (11.2.3), then 
<f>{A) is defined as the sum of the series (11.2.4). 

If <fi{z) is a rational function, then ^(A) is defined in two ways— 
by Definitions 3.7.2 and 11.2.3. However, the preceding theorem 
shows that the two definitions of ^(A) are consistent. 

In future, when we use the term ‘matrix function’ we shall mean 
the sum of a power series in A and assume that all characteristic 
roots of A lie in the interior of the circle of convergence of this power 
series. 

11.2.4. Since the power series for exp 2 , cos 2 , sin z converge 
everywhere, it follows that the matrix functions 

ejtpA = I+lA+iA«+iA»+..., 

C08A - I-IaH 

sinA = A—... 

axe defined for every matrix A. From this it follows at once that 
exp(iA) = cosA+isinA; 
and this, in turn, leads to the identities 

cos A = i{exp(iA)+exp(—iA)} ' 

1 1 . ( 11 . 2 . 7 ) 

sinA = —{exp(iA)—exp(—iA)} 

Exebcisb 11.2.3. Define the hyperbolic functions of A and derive the 
basic identities connecting them. 

The actual evaluation of a function ^(A) may be carried out in 
different ways. We can, for instance, make use of the diagonal 
canonical form—^if A possesses such a form—^and obtain ^(A) by 
means of the expression (11.2.6). Alternatively we can sum the 
power series defining ^(A) by appealing to the theorem of Cayley 
and Hamilton. As a simple example of the latter procedure con¬ 
sider a matrix A of order 4 whose characteristic roots are n, —n, 0,0. 



387 


XI, §11.2 POWER SERIES AND MATRIX FUNCTIONS 
Then A satisfies the equation A*—7r®A* = O, and we haye 

sinA = A-^A»+iA._lA>+... 

In this case, then, sin A is equal to a polynomial in A. We shall see 
in § 11.3 that this is not due to the specific circumstances of the 
problem but illustrates a general result on matrix functions. 

The question to which we must now turn is whether functional 
equations of scalar analysis continue to remain valid when the 
scalar variable z is replaced by a matrix A. The next theorem 
furnishes us with most of the inform^i^tion we need. 

Theorem 11.2.6. Let ifs{z)y xi^) sums of power series 
convergent for | 2 :| < i? and suppose that 

<I>(Z)^(Z) = xi^) (|2|<^)- 

If all characteristic roots of A are less than R in modulus, then 

^(A)^(A) = x(A). 

Write 

= I; ^(z) = 2 6„z”*, x(2) = f 

w*=0 m=0 m=0 

SO that 

Cm = «0^m+«l6m-l+-+«m*'0 ("* = 0, 1, 2,...). 


Let <^m = 2 Kll^>m-vl- 

v«0 

00 

Then 2 z'^ is convergent for \z\ R and therefore, by Theorem 

m=*0 

00 

11.2.4 (i), 2 A*” is absolutely convergent. Hence, by Theorem 

m=0 

11 . 2 . 1 , 

i ( I |0.l|6m-vl)l|A’”|| = 2 ll^^mA^tl 

m^O'v^O ' m“0 

is convergent. Since this is a series of non-negative terms, we see 
that 

00 m 00 m 

2 2KII*>m-.II|A’”||= 2 2lK6m-vA"*|l 

ni>«0v»0 m«»0 vs«0 


5682 


Z 



MATKIX ANALYSIS 


338 MATKIX ANALYSIS XI, § 11.2 

is convergent. Hence, again by Theorem 11.2.1, the matrix series 

00 m 
v«0 

is absolutely convergent and may therefore be rearranged. Accord¬ 
ingly, we have 

00 00 / m 1 00 m 

X(A) = 2 Cm A»* = 2 ( 2 «.6m-v)A”* =22 A” 

m—0 7n—0V*=0 • m*-0v=0 

= i». I 6.-, A" - I 2 6, - 2 »,A' 2> A/* 

v — 0 m—v v*»0 /4=0 v«0 /x»»0 

= ^(A)^(A). 


Exercise 11.2.4. Prove Theorem 11.2.6 for the case of a matrix A which 
is similar to a diagonal matrix. 

Exercise 11.2.5. Let ^( 2 ), 0(2), x(^) sums of power series convergent 
for l^l < i?, and suppose that 0(2)-|-0(2) = x(^)‘ Show that if all charac¬ 
teristic roots of A are less than R in modulus, then 0(A) +0(A) = x(A). 

By means of Theorem 11.2.6 and Exercise 11.2.5 we can derive 
functional equations for matrices from corresponding scalar results. 
Thus, since exp 2 .exp (—^j) = 1, it follows that, for all A, 

exp A.exp(-—A) = I. 

This equation shows that exp A is non-singular for all A, and that 
(exp A)-^ = exp(—A). (11.2.8) 

Again, we have cos 2% = cos^a;—sin^sj. Hence, for all A, 
cos2A = (cos A)2—(sin A)2, 

and it is clear how other relations of the same type may be obtained. 

The principal result expressing the behaviour of the scalar 
exponential function is the addition theorem 

exp a;, exp 2 / = exp(a;+t^). 

We are naturally interested to know whether the analogous result 
is true for matrices, i.e. whether the identity 

exp A.exp B = exp(A+B) (11.2.9) 

is valid. It is easy to see that this is not the case. Indeed, the 
validity of (11.2.9) for all A and B would imply that exp A and 
exp B always commute, which in general they do not. Consider, 
for example, the matrices 




XI, f U.2 POWEB SEBIES AND MATRIX FUNCTIONS 389 

By the theorem of Cayley and Hamilton we have A® == A, B® = B, 
and it is therefore easily verified that 

expA = I+(e-l)A = I® 
expB = I+(e-l)B = (® ^-®|. 

Hence 

expA.expB = expB.expA = 

and this suffices to show that (11.2.9) is not generally valid. In the 
particular case under consideration we have, moreover, 



so that (A+B)2 = 2(A+B). Therefore 

exp(A4-B) = I+|(e®-l)(A+B) = J). 

The fact that so fundamental a law as the addition theorem for 
the exponential function is not valid for matrices shows that 
matrix analysis does not follow a course analogous to that of scalar 
analysis but diverges from it almost at the start. 

There exists, however, a valid modification of the addition 
theorem. To obtain it, we shall consider rectangular matrices 
whose elements are functions of a single variable, say t. We shall 
denote matrices of this type by symbols such as A(<) = {a^j{t)). 

Definition 11.2.4. The matrix A(<) = (%(0) 
differentiable if all its elements a^j{t) are differentiable. Its derivative 
is then defined by the formula 

A(0 - Iaw = 

Exercise 11.2.6. Let A(<), B(0 be rectangular differentiable matrices, 
for which the product A(t)B{t) is defined. Show that 

|{A(0B(<)} = A(<)B(0+A(<)B(<). 

Theorem 11.2.7. For any matrix A we have 

■y-exv(tA) = Aexp(<A). 
at 



340 


MATRIX ANALYSIS 


XI, §11.2 


To prove this identity, we observe that 


{exp(fA)},^ = 2 

m=0 

Now the power series in t on the right-hand side converges for all 
values of ^. It may, therefore, be differentiated term by term, and so 

sK“P(‘A))«] = 2 


Hence 


Y 00 1 

—exp(«A) = T —=—<»»-iA" 
dt ' A, (w-1)! 


oo 

-a2 


( to — 1 )! 


= Aexp(<A). 


Theorem 11.2.8. // A and B commvie, then 

expA.expB = expB.expA = exp(A4-B). 

We first observe that if A and B commute, then exp(<A) and B 
commute also. For, using Exercise 11.2.1 (p. 331), we have 


eip(«A).B=(25A")B = 2 

\ m ««0 ’ / m »0 


00 

® 2 


We next employ an argument which is modelled on the proof of 
the scalar functional equation exp a:, exp y = exp(a;+y). Writing 

C(0 = exp{<(A+B)}.exp(—<A).exp(—<B) 
and using Exercise 11.2.6 and Theorem 11.2.7, we obtain 
(i(t) = (A+B)exp{t(A+B)}.exp(—<A).exp(—tB)4- 

+exp{<(A+B)}.(—A).exp(—<A).exp(—<B)+ 
+exp{<(A+B)}.exp(—<A). (—B) .exp(—<B). 

Hence, in view of the preceding remarks, C(t) = O, and so C(0 is 
independent of t. Thus C(l) = C(0) = I, i.e. 

exp(A+B).exp(—A).exp(—B) = I, 
and so, by (11.2.8), 

exp(A+B) = exp B. exp A. 



XI, §11.2 POWER SERIES AND MATRIX FUNCTIONS 341 

Hence, by symmetry, 

exp(A+B) = exp A. exp B. 

From Theorem 11.2.8andtheidentities(11.2.7)on p. 336, we can 
easily deduce the addition theorems for trigonometric functions of 
a matrix variable. Thus, if A and B commute, then 

cos(A+B) = cos A cos B—sin A sin B, 

and so on. Broadly speaking, we may say that functional equations 
in a single scalar variable retain their validity for matrices, but 
functional equations in two variables remain true in matrix 
algebra only for pairs of commuting matrices. 

11.3. The relation between matrix functions and matrix 
polynomials 

Every term of a power series in A is equal to a polynomial in A 
whose degree is less than that of the Ininimum polynomial of A. 
This suggests that the sum of every power series in A is also equal 
to a polynomial in A. To prove this is the object of the present 
section. 

Theorem 11.3.1. Let Gi,..., G/j. he linearly independent matricesy 
tfni (rn ^ l \ i l,...,ifc) given nuniberSy and 

= (OT^i). (11.3.1) 

1=1 

If lini existSy then lim exists for every i in the range 1 ^ i < fc. 

m->oo m->oo 

For r,5 = l,...,7i, we denote by the matrix whose (r,5)th 

element is equal to 1 and all of whose remaining elements are equal 
to 0. Writing 

(r== Ynm (r ,« = 1.-. n;m^l) 

we have = I yr«» (w > 1). (11.3.2) 

The existence of lim means that 

m->oo 

lim exists for all r,s = 1. n. (11.3.3) 

m-voo 

By Theorem 2.3.5 (p. 54) we know that matrices C„t can 

be foimd such that Ci,..., C„i constitute a basis of the linear mani¬ 
fold of aU n X » matrices. Every matrix can then be represented 



MATBIX ANALYSIS 


342 


XI, S 11.3 


as a unique linear combination of Gi,..., G„i. In particular, we 
write 

I(r.«) = ^ G< (r, s = 1,..., n). 

Hence, by (11.3.2), 

I’m = .i ( i yram^rsi)Ci > 1). 

and therefore, by (11.3.1), 

tmi = 1 Vrsml^rBi (m ^ 1, i = 
r,s=l 

The assertion now follows at once by (11.3.3). 


Theorem 11.3.2. Every function ^(A) of a matrix A is equal to 
a polynomial in A whose degree is less than the degree of the minimum 
polynomial of A. 


This result may come as a disappointment to the reader, but it 
must not be taken to imply that the terms ‘matrix function’ and 
‘matrix pol 3 niomiar are synonymous and that the former may 
therefore be discarded. Indeed, if ^(A) = ^)(A), where p is a poly¬ 
nomial, it does not follow that ^(B) = p(B) since the relation 
between <f) and p is not one of identity but depends on A.f 
To prove the theorem we write 


00 m 

^(A) = 2c,A^ <f>JA) = J,c,A-. 

v=0 v=0 

If fc is the degree of the minimum polynomial of A, then every 
polynomial in A is equal to a suitable polynomial in A of degree not 
exceeding k—l\ and in particular we can write 


A” = ’ZPviA* (v > 0). 
i=0 

Putting „ 

imi = 2 <^vPvi (m > 0; i = 0,1 ,..., k— 1) 

*-1 

we obtain ^m(A) = J, <m<A^. 

i=0 

Now lim <^m(A) = ^(A); furthermore, the matrices I, A,..., A*"^ 

are linearly independent by the definition of k. Hence, by Theorem 
11.3.1, there exist numbers such that 


and therefore 


lim tmi 


— (i — 0,1,..., 

M) 

i-»0 




t Cf. Theorem 7.4.3 (p. 204), of whioh Theorem 11.3.2 is a generalization. 



XI, § 11.3 MATRIX FUNCTIONS AND POLYNOMIALS 343 

The assertion is therefore established, and it is interesting to note 
that we have actually proved a little more than Theorem 11.3.2. 
For in the argument above we used only the fact that <f>(A) is the 
sum of a convergent power series without relying on the stronger 
assumption that ^(A) is a matrix function in the sense of Definition 
11.2.3. 

11.4. Systems of linear differential equations 

11.4.1. In the present section we shall be concerned, as we 
were in the latter part of § 11.2.4, with rectangular matrices whose 
elements are functions of a single variable. We begin with a few 
definitions. 

The derivative of A{t) = (^(O) has already been defined. 
Analogously, we denote by J A(t) dt the matrix whose (i, j)th 

p 

element is J dt. The matrix A-(^) is called integrable, con- 

V 

tinuous, or bounded respectively if all its elements have the 
property in question. 

In conformity with Definition 11.1.3 (p. 328), we write 
|1A(011 = max \{A{t)},^\. 

If A(t) is continuous, then ||A(<)|| is continuous and therefore 
integrable. 

Exkkcise 11.4.1. Let g{u) be continuous functions such that 

f{u) < g{u) for 0 < |t4| < |^|. Show that 

t t 

j f(u) du ^ j g(u) du , 

0 0 

and deduce that, if C(t) is continuous, then 

\\ t II t 

J C(u) duA < J llC(w)|| du . 

II 0 II 0 

For r ^ 0 we define 

m(r;A)= bd \\A{t)\\. / 

If P(i), 0(0 are nXn matrices, then obviously 

llP(0O(0ll<«)l|. 110(011, 

and therefore 

l|P(^)0(w)|| < n||P(w)||.m(r; Q) (0 ^ \u\ < r). (11.4.1) 



344 


MATRIX ANALYSIS 


XI, §11.4 


It will be recalled that the series 

i A^(0 (11.4.2) 

is said to converge if each of the series 

S (i,3 = 

A ;«»0 

is convergent. If, in addition, all these series converge uniformly 
in ^ ^ series (11.4.2) converges uniformly 

in that interval. This is certainly the case if, for i ^ 0, 

l|Afc(<)||<Cfc (hKtKh) 

CO 

and y Cj^ is a convergent series. 

fc=:0 

11.4.2. We propose to discuss systems of linear differential 
equationsf of the form 


= ®ll(0^1+-*+«ln(0^n+^l(0 


— ^nl(0^1+*-4'^nn(0^w+^n(0 


(11.4.3) 


Here the a^jit) and 6^(^) are given functions and we wish to determine 
whether there exist functions a;i(^),...,a;^(^) satisfying (11.4.3) and 
also some set of initial conditions and, if so, whether these functions 
are unique. The result we shall establish is as follows. 

Theorem 11.4.1. If hi(t) (i,j = l,...,n) are continuous 

Junctions oft, and c^,..., are any constants, then there exists one and 
only one set of functions xjt),.„,xjf) satisfying the system (11.4.3) 
vnth the initial conditions 

a'i(O) = Cl, a;„(0) = c„. (11.4.4) 

We begin by restating the problem in terms of matrices.$ Let 

m = K(<)), b(o = (6i(o. b^{t)Y, 

X = x(<) = 


t We intend here to do no more than consider an isolated problem. For a 
systematic treatment of differential equations by the methods of matrix algebra 
see Frazer, Duncan, cuid Collar, 16, chaps, v-vii, and Lefschetz, Lectures on 
Differential Equatixms, The reader’s attention is also drawn to the remarks in 
Perlis, 6, 138-42 and 166-6. 

X 1 am indebted to Professor H. A. Heilbronn for the proof given below. 




XI, §11.4 LINEAR DIFFEBBNTIAL EQUATIONS 346 

With this notation (11.4.3) and (11.4.4) become respectively 

X = A(<)x+b(<), (11.4.6) 

x(0)^(ci,...,c„)^. (11.4.6) 

We have, therefore, to show that there exists a unique vector x{t) 
which satisfies (11.4.5) and for which x(0) is prescribed. 

Assume, for the moment, that there exists an riX» matrix 
M(<) such that ^(t) = (alU), (11.4.7) 

|M{«)|v ^0 (alU). (11.4.8) 

If (11.4.5) possesses a solution x, then 

^{M(<)x} = lfl(<)x4-M(f)x 

= -M(«)A(<)x+M(<){A(0x+b(<)} = M(<)b(<), 

and therefore 

M(i)x = J M(ii)b(M) du 4-M(0)x(0), 

0 

x(t) = {M(0}-ij I M(tt)b(M)dtt +M(0)x(0)j. (11.4.9) 

Thus, if (11.4.5) is soluble subject to (11.4.6), then the solution is 
given by (11.4.9) and so is unique. On the other hand, it can be 
verified at once that the vector x(i) defined by (11.4.9) satisfies 
(11.4.5) and (11.4.6), The proof of the theorem will therefore be 
complete if we can establish the existence of a matrix M(0 satis¬ 
fying (11.4.7) and (11.4,8). We recognize, of course, that M{t) 
plays the part of an ‘integrating factor’ of the equation (11.4.5). 
We write ^ 

Mq(^) = I; M;i.+i(^) = J M4.(t^)A(t4) du {k = 0,1,2,...). 

0 

These recurrence relations define a sequence {M;i.(0} of matrices. 
Each of these matrices is continuous and, indeed, differentiable; 
and we have 

= M^(t)A{t) {k == 0,1,2,...). (11.4.10) 

We now assert, for all r ^ 0, the inequality 

llMfc(<)ll < l«l.m(r;A))*' {\t\ < r; A: = 0,1,2,...). 

(11.4.11) 



MATBIX AKALYSIS 


346 


XI,§U.4 


For X; = 0 this is true trivially. Assume that the inequality holds 
for some k'^0. Using Exercise 11.4.1, we have 


l|M*+i(0ll< JllM;,(«)A(«)|ld« 
0 


Now in the integral on the right-hand side 0 < |tt| < < r and 

so, by (11.4.1), ^ 

l|M*+i(011 < J »||Mfc(«)||m{r; A) du. 

0 

Hence, by the induction hypothesis, 

1 r 

l|M*+i(«)ll < J lul^du 

The inequality (11.4.11) is therefore proved, and it follows that 
llMft(<)ll<Cft (|<| <r;l: = 0,1,2,...), 


where c* = ^{nr.m(r; A)}*. 

CO 

Now ^ is an exponential series and so is convergent. Therefore, 
for any r ^ 0, the series 


H” ^2(0 M 3 (^) H“ ... 

converges uniformly in the interval |^| ^ r. Denoting the sum of 
this series by M(^), differentiating formally term by term, and 
using (11,4.10), we obtain 

= ~{Mo(0~Mi(«)+M2(0-...}A(0. (11.4.12) 


Now since A{t) is bounded for |^| < r, it follows that the series 
(11.4.12) is uniformly convergent in that interval. Hence M(0 is 
differentiable for \t\ < r, and its derivative is given by (11.4.12). 
But r can be chosen arbitrarily large; hence M(0 is everywhere 
differentiable and satisfies (11.4.7). 

We next consider the sequence {N;i.(^)} defined by the relations 

t 

No(<) = I; N*+i(«) = J A(«)Nfc(«) du {k = 0,1,2,...). 

” (11.4.13) 



XI, § 11.4 LINEAR DIFFERENTIAL EQUATIONS 347 

By an argument analogous to that just used it follows that the 
N(<) = N„(0+Ni(<)+N2(<)+... (11.4.14) 

satisfies the relation 

]^(«) = A(0N(0, N(0) = I. 

Hence 

^{M(0N(0} = 

= -M(0A(0.N(0+M(0.A(0N(0 = O, 
and therefore M(t)N{t) = M(0)N(0) = I. 

Thus M{t) possesses an inverse for all values of t, and (11.4.8) is 
satisfied. The theorem is therefore proved. 


The solution of the system of differential equations (11.4.3) 
reduces to a particularly simple form when all are constants 
and all b^it) vanish. In this case we have the following result. 

Theorem 11.4.2. The system of differential equations 

^ = Oiiari+...+ai„xJ 


dx, 


dt 


f = ^nl^l+-+^n 


(11.4.15) 


subject to the initial conditions (11.4.4), possesses a unique solution 
given by ^ exp(<A).x(0), (11.4.16) 

where A = (Oy), x(<) = («!(«), ...,*n(0r. x(0) = (Ci . cj^ 

The uniqueness of the solution is guaranteed by Theorem 11.4.1. 
Moreover, we infer from (11.4.9) that this solution is given by 

x(0 - {M(<)}-»x(0) = N(<)x(0), 

where N(<) is defined by (11.4.14). From (11.4.13) we obtain at 
once Nj,(<) = {k = 0, 1 , 2 ,...). Hence N(i) = exp(lA), and 

(11.4.16) follows. 

It is even easier to verify directly that x{t),as defined by (11.4.16), 
satisfies (11.4.16). For we have, by Theorem 11.2.7, 


i(<) = Aexp(tA)x(0) = A.x(t), 
and this is equivalent to (11.4.15). 




348 


MATRIX ANALYSIS 


XI, §11.4 


Exercise 11.4.2. State the case n = 1 of Theorem 11.4.2. 

Exercise 11.4.3. Obtain a proof of Theorem 11.4.2, independent of 
Theorem 11.4.1, for the case when A is similar to a diagonal matrix. 


PROBLEMS ON CHAPTER XI 


1. Let <f>{A) be a matrix fimction. Show that <^(A^) = {^(A)}^. 

2. Show that, for every matrix A, |expA| = exp(trA). Deduce that, 
for a skew-symmetric matrix A, |exp A| = 1. 

3. Let Aj,..., A,i be the characteristic roots of A, and let <{>(A) be a function 
of A. Show that the characteristic roots of ^(A) are ^(Ai),...,^(A,i). 

4. If 


show that 


(0 1 0 \ 

= 00 1 , 

\l 0 0/ 

( I m n\ 
n I ml, 
m n ij 


where I = e+eP-f m — 

lx , 


n — e-fpeP+p^eP* and p = 

7 '). 


where x is any complex number, show that 

exp{A(aj)} = eA(e®“^). 

/O 1 0\ 

A=(2 0 2, 

\0 1 0 / 

show that I ab \ 

expA=j2a6 2abu 

\ b^ ab ] 


where a — cosh 1,6 = sinh 1. 

7. Show that if A is a real skew-symmetric matrix (skew-hermitian 
matrix), then exp A is orthogonal (unitary). 

8. Prove that if A is hermitian, then exp(iA) is unitary; and interpret 
this result for the case n = 1. 

9. Show that, in general, the relation 

^{A(<)}™ = w{A(0}’>^‘A(«) 

is not valid. Under what conditions is it valid ? 

10. Show that every unitary matrix U can be represented in the form 
U = exp(iH), where H is hermitian. 

11. For any complex matrix A = (a„), let N(A) denote the positive 

n 00 

square root of 2 kwl** Show that the matrix series 2 converges 

r,«-i i»-o 

00 

absolutely if and only if J converges. 

f »-0 



XI PROBLEMS ON CHAPTER XI 349 


12. By differentiating the matrix exp(tA) exp(—^A) with respect to t 
establish the identity (exp A)"“^ = exp(—A). 

13. Let A = (Ora), B = (6„) be ri x n matrices. Show that the inequalities 
\arg\ < |6„| (r,5 = l,...,w) do not imply that p(A) < /o(B). 

14. Let be a power series and A the n x n matrix 


/A 

1 

0 

0 . . 

0\ 

0 

A 

1 

0 . . 

. 0 

0 

0 

A 

1 . . 

0 

lO 

0 

0 

0 . . 

’ A; 


Show that the matrix power series ^(A) converges if and only if the scalar 
power series ^( 2 ), all converge for z — X, 

16. Evaluate exp A, when A is the matrix given in the preceding question. 
16. Show that, if / u v —/x 

S = (~v 0 A 

\ /X —A 0 


then 

and 


expS 


j ^ sineo g ^ 1 —cosco g^ 


sinS = I+(?i?^-l)s. 


where cu® = A* -h/x®+How are these results to bo interpreted when co = 0 ? 

17. Assuming that A^ -► A (m 00 ), where |A| ^ 0, show that A,^ is 

non-singular for all sufficiently large values of m and that (A^)“^ A“^. 

18. The characteristic roots of a matrix A of order 3 are Jtt, tt, — tt. 
Prove, by induction with respect to n, that 


Deduce that 


A2«+2 = j7r2"(4-2-2«)A2-f7r2»»+2(l-2-2«)I. 


sinA == 



19. If the characteristic roots of A are the nth roots of imity, prove that 
(A*-6A+6I)- = y 


Jfe=0 


20. The characteristic roots of a matrix A are the nth roots of unity. 
Prove that, for |c| < 1, 

(I-cA)-* = (l-c«)“2 2 c*^{(A;+l)+(w-A;-l)c~}A^ 

21. Deduce the first inequality in Theorem 7.6.3 (p. 211) from Theorem 

11.1.2 (p. 329). 

22. Show that if lim A^ exists, then it is equal to a polynomial in A. 

^-♦•00 

23. Show that, if A; is a positive integer and A is a matrix all of whose 
characteristic roots are less than 1 in modulus, then 


(I-A)-^ = 


v-0 







350 


MATRIX ANALYSIS 


XI 


24. All elements above the diagonal of the matrix A are equal to 1 and 
all other elements are equal to 0. Show that 


” (jfc-i)’ 


where is interpreted as 0 when p < g. 

25. Determine all skew-symmetric matrices S which satisfy the equation 
exp S = A, where A is a given proper orthogonal 2x2 matrix. 

26. Show that, if A is a proper orthogonal 3x3 matrix, then there exists 
a skew-symmetric matrix B such that A = expB. 

27. Let he continuous functions. Show that there 

exists one and only one function y of x, with prescribed values for 

dy 


when ic = 0, and satisfying the differential equation 

g+/.w£3+-+/^.w !+/.(»>» 


g(x). 


28. Show that the solution of the differential equation 




dy 




da;" 


dx 


(ai,...,an constants; Q(x) a continuous function), subject to the initial 
conditions », ^n-u. 

y-i-'-hA-o <*-»). 

can be written in the form 


•c 

y = j {exp(tA)}i„0(a:-0d«, 


where 


A = 


0 

1 

0 

0 . . 

0 

0 

0 

1 

0 . . 

0 

0 

0 

0 

0 . . 

1 



— On-a 

®n-3 

—an 


29. Let a, b, c, d be real numbers, and suppose that 6, c are not both zero. 
Prove that all solutions of the system 

X — ax^by, y = cx+dy 

are boimded if and only if the characteristic roots of are negative and 

distinct. 

30. Show that there exists one and only one set of functions Xi(t),,„ 9 Xf^(t) 
satisfying the system of differential equations 










PROBLEMS ON CHAPTER XI 


XI 


351 


and having prescribed values for Show further 

that these functions are specified by the identity 

x(«) = <l,{t^A)K(0)+up(t^A)iL(0), 
where x(0 = {x^{t)„.„Xn(t))^, A 




(2m)!’ Z^(2m+1)!' 

j»*=d m — O 

If the characteristic roots of A are distinct negative numbers, prove that 
all solutions of the given system of differential equations are bounded. 

31. Use the preceding question to find the solution of the equations 

d^u 


dt^ 


•{-2u-\-v-\-w = 0, 


^+uJt-2v+w = 0, 


^+m+v+2w = 0, 


such that u — V — w 





= c when ^ = 0. 


32. Show that the general solution of the equation expX = can be 
written in the form 

X = S.dg(27rH, 27Tli)S-\ 

where S is an arbitrary non-singular 2x2 matrix and ifc, I are arbitrary integers. 

33. Let Ui,..., a„ bo complex and real non-negative numbers, and 

suppose that \aj^\ < 6^ (A; = l,...,n). Using Theorem 11.1.2 (p. 329) and 
Problem VII, 32 show that, if A denotes a root of greatest modulus of the 
equation ... 4-o„ = 0 and fi denotes a root of greatest modulus 

of the equation x^—biX^-^ — ,„—bn = 0, then |A| < |/x|. 

00 

34. Let /(A) = 2 c^A^^be a convergent power series in the matrix A. 
Show that, for 0 < < < 1, 

/(A; «) = 2 

is again a convergent power series in A, and that 

lim/(A; 0=/(A). 

<-> 1-0 

35. A is a matrix with positive elements and characteristic roots 
l,a> 2 ,...,a>^, where jcojl < 1,..., |a)„| < 1. Show that there exists a non¬ 
singular matrix T such that T“^AT is of the form 



where b is a row vector of order n — 1 and B is an (n — 1) x (n — 1) matrix. 
Prove that, for m -► oo. 


a-»t(J 




By considering the vector x = ( lim A*")y, where y is any vector with 

fn->oo 

positive components, show that A possesses a characteristic vector with 
positive components associated with the characteristic root 1. 



352 


MATRIX ANALYSIS 


XI 


36. The characteristic roots of the matrix A are ct>n> 

|a)i| > |wjb| (k = Show that lim exists and is non-zero; 

WI-+00 

and deduce that, for a suitable vector x, lim (a>f is a characteristic 

m-^co 

vector of A associated with coi. 

37. A is a real matrix with positive determinant. Show that there exists 
a matrix A{t), continuous for 0 < f < 1, and such that 

A(0) = A, A(l) = I, \A{t)\ (0 < « < 1). 

Show further that the matrix A(<) = (1—^)A +tl has the required properties 
if and only if no characteristic root of A is real and negative. 

38. Let A be a proper orthogonal matrix. Show, by using Problem 25 and 
Problem X, 44, that there exists a real skew-symmetric matrix S such 
that A = exp S. 

39. Let A; > 1 and suppose that the characteristic roots of a 

matrix A satisfy the conditions 

l^il ~ 1 (t = 1,^ {i ~ k-\- l,...,tl). 

Show, by using Problem IX, 32, that, as m ->• oo, all elements of A^ are of 
the form 



PART III 


QUADRATIC FORMS 

XII 

BILINEAR, QUADRATIC, AND 
HERMITIAN FORMS 

We have now carried the discussion of matrices as far as we intend 
to, and in this chapter and the next we shall apply the results 
previously obtained to the study of quadratic forms and related 
topics. 

12.1. Operators and forms of the bilinear and quadratic 

types 

12.1.1. It is necessary at this stage-to make use of some of the 
properties of linear manifolds derived in Chapters II and IV. 

Definition 12.1.1. Let 501 and 5R he two linear manifolds over a 
field F) be a function of two variables X, Y which are 

elements of 501,5R respectively, and suppose that the functional values 
of (f)(X, Y) are numbers of g. If Y) is linear in both variables, 
if for all a e X,X' e 501, Y,Y' e we have 

<l>(ocX,Y) = oc<f>{X,Y) = cl>(X,ccY)\ 

<I>(X+X\Y) = ^(X,Y)+<I>(X\Y) , (12.1.1) 

<l>(X,Y+T) = <l>(X,Y)+<f>{X,T)j 
then <^{X, Y) is called a bilineak operator on 50i and 5R. 

Exercise 12.1.1. Show that the relation 

= r)-f Y')+cc'P<l>(X\ + Y') 

holds for all 0 L,a',p,p' € JJ, X,X' e 501, Y,Y' 6 5)^1 if and only if (12.1.1)i3 
satisfied. 

In the discussion below we shall write 

d(50l) = m, d(5)fl) == n. 

Since linear operators on linear manifolds possess matrix repre¬ 
sentations, it is natural to seek some such representation for 
bilinear operators. 

t The possibility that 501 and 5R are identical is not, of course, excluded. 

6682 


A a 



364 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XU, § 12.1 

Theorem 12.1.1. Let^— {Ei,..., E^, (C = {jF*!,..., F„} be bases 
of S5l, 91 respectively. If ^{X, Y) is a bilinear operator on 951 and 
91, and if ^ ^ y ^ 

then ^{X, Y) = x^Ay, 

where the mxn matrix A = (a„) is defined by the relations 

a„ = <f>(E^,Ff) (r = 1 .m; s = 1. n). (12.1.2) 

The assertion is virtually obvious. Writing 
X = a;jr y = 


we have 


m 

X = 

r=l 


2 Va Fs, 




and so, by (12.1.1), 


m n m n 

<f>{X,Y) = f t^rys4{Er,F,) = ^2 2««a^r2/* = x»’Ay. 

r=ls=l r=l«=1 


Matrix products of the type x^’Ay will clearly play a prominent 
role in the theory of bilinear operators. 

Definition 12.1.2. Any polynomial 

m n 

2 la„x,y, = xTAy, (12.1.3) 

r=l «=1 

where A = (a„), x = («!,..., x„)^, y = (y,,..., y„)^ is called a 
BILINEAR FORM in the two sets of variables Xi,..., x„ and y^,..., y„. 
The mxn matrix A is called the matrix of this bilinear form. 

There is thus clearly a biunique correspondence between rect¬ 
angular matrices and bilinear forms. 

Definition 12.1.3. If, with the notation of Theorem 12.1.1, 
f>{X, Y) = x^Ay, then, the bilinear form x^Ay (or alternatively the 
matrix A) is said to represent the bilinear operator <}>{X,Y) with 
respect to the bases S, <£. 

If 951 and 91 are the same manifold and iB and d the same basis, 
we say that x^Ay represents <j>{X, Y) with respect to S. 

Theorem 12.1.2. For any fixed choice of bases ®, C in 951, 91 respec- 
tivdy, there is a biunique correspondence between bilinear operators (on 
951 and 91) and their representing matrices. 

This result is implicit in our previous remarks. For to a given 
bilinear operator ^(X, Y) corresponds the representing matrix A 



355 


Xn,§12.1 BILINEAR AND QUADRATIC TYPES 

defined by (12.1.2). On the other hand, if A is given, then the 
bilinear operator ^ which it represents with respect to S, CE is 
uniquely determined by (12.1.2) and (12.1.1). 


The representation of a bilinear operator by a bilinear form (or 
by a matrix) depends, of course, on the arbitrary choice of bases 
in SW and 51. The relation between bilinear operators and bilinear 
forms is in this respect analogous to that between the elements of 
a linear manifold and vectors, or that between linear operators and 
matrices. 


Theorem 12.1.3. The mxn matrices A and B represent the same 
bilinear operator with respect to suitable pairs of bases if and only if 
there exist non-singular matrices P, O, of order m, n respectively, 
such that g ^ pi'AO. (12.1.4) 

Suppose, in the first place, that A represents the bilinear operator 
^ with respect to SB, G and B represents the same operator with 
respect to SB', (£'. Then 

<f,(X,Y) = xTAy, (12.1.5) 

where x = m(X; SB), y = ^(Y; G). (12.1.6) 

Write x' = ^{X; SB'), y' = ^{Y; G'). (12.1.7) 


Then, by Theorem 4.1.2 (i) (p. 112), there exist non-singular 
matrices P, 0. of order m, n respectively, and independent of 


X, Y, such that 


X = Px', y = Qy'. 


( 12 . 1 . 8 ) 


Hence, by (12.1.5), 

<f>(X,Y) = (Px')»’A(Oy') = x'r(prAO)y', 

and so P^AQ represents <f> with respect to SB', G'. Hence (12.1.4) 
follows. 

On the other hand, let A and B be given matrices connected by 
the relation (12.1.4), where P and 0 are non-singular. Let SB, G 
be any bases in 9Jl, 91 respectively. By Theorem 4.1.2 (ii), there 
exist bases SB', G' in 501, SR respectively such that, if x, y are defined 
by (12.1.6) andx', y' by (12.1.7), then (12.1.8) is valid. Let now <f> 
be the (uniquely determined) bilinear operator represented by A 
with respect to SB, G. Then, by (12.1.8), 

4>{X, Y) = x^'Ay = x'^’By', 



SeO BILINEAB, QUADRATIC, AND HERMITIAN FORMS XII, § 12.1 
and therefore B represents the same operator ^ with respect to 

DESiNmoN 12.1.4. If P, 0 ore non-singvlar matrices, then tite 
svbstitiUions'K ■= Px', y = Qy' for the variables in the bilinear form 
(12.1.3) are jointly called a non-singvlar linear transformation of 
the bilinear form. 

Exercise 12.1.2. Show that, with the obvious definition of the rule of 
composition, the non-singular linear transformations of a bilinear form 
constitute a group. 

Our preceding remarks show that if is a given bilinear operator 
on 9M and SR, then a change of bases in S0l, SR induces a non-singular 
linear transformation of the representing bilinear form, and 
conversely. 

For many purposes the notion of a bilinear operator is best set 
aside in favour of the notion of a bilinear form, since matrix theory 
provides us with a ready technique for handling such forms. It 
should, nevertheless, be realized that our concern is not so much 
with bilinear forms as with the underlying bilinear operators. 
For this reason properties common to all bilinear forms represent¬ 
ing the same bilinear operator are of particular interest since they 
express intrinsic features of that operator. One such property is 
that of rank. 

Definition 12,1.5. The rank of the bilinear form x^Ay is the 
rank of A. 

By Theorems 12.1.3 and 5.6.3 (p. 160) it follows that any two 
bilinear forms representing the same bilinear operator have equal 
rank.f We may therefore introduce the following definition. 

Definition 12.1.6. The rank of a bilinear operator is the rank 
of any bilinear form which represents it,% 

12.1.2. The most important subclass of bilinear operators is 
that of quadratic operators. 

t By virtue of the theorem on the equivalence of matrices (Theorem 6.2.3, 
p. 176), the converse is also true, and two m x n matrices of equal rank represent 
the same bilinear operator with respect to suitable pairs of bases. 

% For further discussion of bilinear operators and bilin^ forms, see Jacobson, 
20| chap. V, and Julia, 25» 110-16. 



Xn,§12.1 BILINEAR AND QUADRATIC TYPES 867 

DBPmrnoN 12.1.7. Let Y) be a bilinear operator in which 
both variables belong to the same linear manifold SR. If the variables 
are taken eqval to each other, then the resvlting function <f>{X, X) is 
called a quadratic operator on SR. 

The dimensionality of SR will be denoted by n. As an immediate 
consequence of Theorem 12.1.1 we see that if S = is a 

basis of SR and x = ^(X; ®), then 

<f>{X,X) = X^Ax = 2 «r**r*0. 

f,S = l 

wherex = and A = (a^g) is the nxn matrix defined by 

the equations (r,s — 

If, by analogy with the usage of § 12.1.1, we were now to say 

n 

that the quadratic function ^ ^ts^t^s represents the quadratic 

r,8=l 

operator <^(X, X) we would be setting up a correspondence between 
quadratic operators and quadratic functions which is not biunique 
(even for a fixed choice of basis in SOI), since if 

= «r8+«8r {r,S = l,...,n) 

and B = (6^^), then x^Ax = x^Bx, and so <f>{X,X) can be re¬ 
presented equally well by x^Ax and x^Bx. We can overcome the 
difficulty by insisting that the matrix A should be symmetric. 
There is no loss of generality in making this assumption, for if 

Ks = IKa +^ sr ) = 1,.-,^), 

then A' = (a'g) is symmetric and <f>{X, X) = x^A'x. We shall, 
accordingly, always represent quadratic operators by expressions 
such as x^A'x, where A' is symmetric. The insistence on symmetry 
enables us to set up a biunique correspondence between quad¬ 
ratic operators, quadratic forms (defined below), and symmetric 
matrices. 

Definition 12.1.8. Any polynomial 

n 

r,s=»l 

= au*i+-+o»n«n+2ai2a;ia:a+...+2a„_i,„x„_ia;„ 

= x^’Ax, 

where x = (ajj,...-, a:^)^ and A = (a„) is a symmeiric matrix, is 
called a quadratic form in the variables Xy ,..., a:„. A is the matrix 
of this form and the numbers a„ are its coefficierUs. 



358 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.1 

It is clear that there is a biunique correspondence between 
quadratic forms and their matrices. 

Quadratic forms in 2, 3,4,...,n variables are called binary, 
ternary, q'mternary,,..,n-ary respectively. 

Definition 12.1.9. If <f>{X,X) is a quadratic operator on S0l and 
f>{X, X) = x^Ax, where A is symmetric and x is the vector represent¬ 
ing X with respex^i to some basis S of 9Jl, then the quadratic form 
x^Ax (or, alternatively, the matrix A) is said to represent the 
quadratic operator (f>(X, X) with respect to 

We leave it to the reader to prove the next statement. 

Theorem 12.1.4. For every given basis S o/ 9Jl there is a biunique 
correspondence between quadratic operators on 501 and the quadratic 
forms representing them. 

The problem concerning the relation between different represen¬ 
tations arises here as it does in the case of linear and bilinear opera¬ 
tors. 

Theorem 12.1.5. Two symmetric nxn matrices A and B 
represent the same quadratic operator with respect to suitable bases in 
501 if and only if they are congruent, i.e. B = P^AP, where P is 
non-singular. 

Suppose that the matrices A and B represent the same quadratic 
operator <f>(X, X) with respect to bases 58 and 58' (of 501) respec¬ 
tively. Thus, if 

X = ^(X; 58), X' = ^(X; S'), 
then <I>(X,X) = x^Ax = x'^Bx'. 

But X = Px', where P is a non-singular nxn matrix, and so 
x^Ax = x'2’(P2^AP)x'. 

Thus the symmetric matrix P^AP represents (f>{X, X) with respect 
to S', i.e. B = P^AP. We leave it to the reader to complete the 
proof of the theorem by establishing the converse. 

Theorem 12.1.6 shows that when the basis in 501 is changed, the 
matrix representing a operator is subjected to a congruence 

transformation. By way of contrast we recall that a change of 

t It is a common practice to designate quadratic operators and quadratic forms 
(and similarly bilinear operators and bilinear forms) by the same term. However, 
the distinction of ncune may serve to remind us of the difference of status of the 
two concepts. 



359 


XII, § 12.1 BILINEAR AND QUADRATIC TYPES 

basis in 9Jl induces a similarity transformation in matrices represent¬ 
ing linear operators. 

One of the main reasons for stud 3 dng quadratic forms is their 
usefulness in the geometry of conics and quadrics; and the relation 
between quadratic operators and quadratic forms can, indeed, be 
aptly illustrated in this context. If Q is a given quadric, then, with 
respect to a system S of projective coordinates Xq, x^^ its 

equation assumes the form 

3 

2 == X^AX = 0, 

r,«=0 

where A = A^ = {a^f) and x = {x^, x^, x^, xff'. If any new system 
S' of coordinates is taken, then vectors x, x' representing the same 
point with respect to the two systems are connected by a linear 
relation x = Px', where |P| ^ 0; and so the equation of Q, with 
respect to S', has the form 

x'r(prAP)x'-= 0. 

We may thus think of the quadric Q as being associated with a 
quadratic operator, and then its various equations (with respect to 
different systems of coordinates) will represent this operator in 
terms of quadratic forms.! We are, of course, interested primarily 
in the intrinsic (i.e. the geometrical) properties of Q rather than 
in the pecularities of any particular equation reresenting it. 
Similarly, in our discussion of quadratic forms in n variables we 
shall be primarily concerned with those properties which are 
common to all quadratic forms representing the same quadratic 
operator, i.e. properties invariant under congruence transforma¬ 
tions. 

Definition 12.1.10. A quadratic form is real [complex) if its 
coefficients belong to the real [complex) field. 

In future, unless the contrary is stated, all quadratic forms will 
be assumed to be real, and this will be taken to imply that the 
variables are also real-valued. 

Definition 12.1.11. The determinant jA] of the matrix A 
associated with a quadratic form <f) is called the determinant of <f>. A 
quadratic form is said to be singular or non-singular according as its 
determinant vanishes or does not vanish. 

t This statement requires some qualification owing to the fact that, for any 
k ^ 0, the equations x^Ax = 0 and x^(^A)x = 0 represent the same qucMlric. 



360 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.1 

Definition 12.1.12. (i) The rank B{<f>) of a quadratic form <f} is 
the rank of the matrix of <f>, (ii) The rank of a quadratic operator is 
the rank of any quadratic form representing that operator. 

In view of Theorem 12.1.5, the second part of this definition is 
unambiguous. 

Definition 12.1.13. The substitution x' = Pxfor the variables 
of a quadratic form x^Ax is called a linear transformation of the 
quadratic form. This transformation is said to be real or complex 
according as the matrix P of the transformation is real or complex] 
it is said to be singular or non-singular according as the determinant 
\P\ of the transformation vanishes or does not vanish. 

We shall, generally speaking, be concerned with real non-singular 
linear transformations. 

Exercise 12.1.3. Show that a binary quadratic form in the variables 
a?, y may be written as ax^ -\-2bxy What is the result of applying the 

substitutions x = y — yx'-{-hy' to the variables? 

We collect below a number of almost obvious results which are 
implicit in our previous discussion, the proof of which may be 
regarded as an exercise. 

Theorem 12.1.6. (i) The substitution x = Px' changes the 
quadratic form associated with the matrix A into that associated with 
the matrix P^AP. 

(ii) A linear transformation has the effect of multiplying the 
determiruint of a quadratic form by the square of the determinant of 
the transformation. 

(iii) The rank of a quadratic form is invariant under non-singular 
linear transformations. 

Finally, it may be observed that non-singular linear (real or 
complex) transformations of a quadratic form constitute a group. 

Exercise 12.1.4. Show that if a quadratic form is subjected to a linear 
transformation specified by the matrix P and the resulting quadratic form 
is then subjected to a linear transformation specified by 0» then the total 
efiect of the two transformations is the same as if the original quadratic form 
were transformed by PQ. 

12.1.3. The algebraic treatment of the projective geometry of 
conics and quadrics, which is based ultimately on Joachimsthal’s 
equation, makes extensive use of a procedure whereby every 



861 


XII, § 12.1 BILINEAR AND QUADRATIC TYPES 

quadratic form gives rise unambiguously to a symmetric bilinear 
form, i.e. to a bilinear form whose matrix is symmetric. The neces¬ 
sary process of symmetrization was, in effect, carried out on p. 367, 
but the relation just mentioned between quadratic forms and the 
associated bilinear forms has an invariant character and is best 
described in terms of operators. 

Let SIR be a linear manifold of dimensionality n and let ^{X, X) 
be a quadratic operator on SIR. If (f>{X, Y) is a bihnear operator 
which gives rise to the quadratic operator (j)(X,X) when the 
substitution F = X is made, then, in general, <f>{X, Y) # <j)(Y,X). 
It is, however, easy to symmetrize <f>(X,Y). By definition 12.1.1 
(p, 363) the operator ^{X, Y), defined by the equation 

^X, Y) = H^(X, Y)+<f>{Y, X)}, (12.1.9) 

is again bilinear; and it evidently satisfies the relations 

4,(X,Y) = 4.(Y,X), (12.1.10) 

^(X,X) = ^X:X), ( 12 . 1 . 11 ) 

for all X,Y e SIR. We express the identity (12.1.10) by calling 
i//{X,Y) a symmetric bilinear operator. It is, in fact, the only 
symmetric bilinear operator which gives rise to j>{X, X), i.e. which 
satisfies (12.1.11). For, by Definition 12.1.1 and (12.1.9), we at 
once obtain the identity 

<I>{X+XY,X+XY) = <f>(X,X)+2Xi.{X,Y)+XmY,Y). 

( 12 . 1 . 12 ) 

If, now, ip'(X, Y) denotes a bilinear operator such that 
f (Z, F) = f (F, X), f{X, X) = <I>{X, X), 
then, similarly, 

•l>(X+XY,X+XY) = ifj’(X+XY, X+AF) 

= <l>(X,X)+2X^'(X,Y)+X^<f,(Y,Y), (12.1.13) 
and it follows by (12.1.12) and (12.1.13) that f (Z,F) = ^(X,Y). 

The unique bilinear operator ^(X, Y) associated with the given 
quadratic form <f>{X,X) is called the polarized operator of <f>(X,X). 
Its representation by a bilinear form is easily obtained. Let the 
quadratic form x^Ax represent if>{X,X) with respect to a basis 
® of SIR and denote by x(.X^, Y) the bilinear operator represented by 
the bilinear form x^Ay with respect to SB. Then clearly 

x{X,X) = x^Ax=^4{X,X). 



362 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.1 
Moreover, since A is symmetric, 

X{X,Y) = x^Ay = y^-Ax = x{Y,X). 

Hence is, in fact, the polarized operator 0(X, Y) of X). 

Thus, if the quadratic operator <f>(X,X) is represented by the 
quadratic form x^Ax, its polarized operator 0(X, Y) is represented, 
with respect to the same basis, by the bilinear form x^ Ay. The bilinear 
form x^Ay is called polarized form of the quadratic form x^Ax. 

The geometrical significance of polarized operators depends 
primarily on the identity (12.1.12). In terms of the representing 
forms we have the corresponding identity 

(x+Ay)^A(x+Ay) = x^Ax+2Ax^Ay+A2y^Ay. (12.1.14) 

If the right-hand side of (12.1.14) is equated to zero we obtain a quad¬ 
ratic equation in A, which is known as Joachimsthal’s equation.f 

12.2. Orthogonal reduction to diagonal form 

12.2.1. In studying problems of euclidean geometry by algebraic 
methods, we normally employ a non-homogeneous, rectangular, 
cartesian frame of reference. It is often desirable to change this 
frame while preserving the origin. The resulting transformation of 
coordinates is then, as we know, represented by an orthogonal 
matrix. The purpose of changing the frame of reference is to 
simplify the equations of the curves or surfaces we are investigating. 
Since these are frequently conics or quadrics we are led to inquire 
in what way the quadratic form x^Ax can be simplified as the result 
of the substitution x = Px', where P is orthogonal. The answer 
to this question turns out to be particularly simple and satisfactory. 

Definition 12.2.1. A diagonal (unit) quadratic form is a 
qiLodratic form associated vnth a diagonal (unit) matrix. 

Thus the quadratic form associated with the matrix dg(ai,..., a^) 
is the diagonal quadratic form a^x\-{'...-\-a^x%. In particular, the 
n-ary unit quadratic form is x\-\-...-\-xl^ = x^x. 

Before proceeding further let us agree on a useful convention. 
If the quadratic form x^Ax is subjected to the linear transforma¬ 
tion X = Px', it becomes x'^(P^AP)x' = x'^Bx', say. Now 
generally it does not matter what symbols are used for the variables 

t For its us© in geometry see Semple and Kneebone, Algebraic Projective 
Geometry^ 107. 



XII, §12.2 ORTHOGONAL REDUCTION TO DIAGONAL FORM 363 

of a quadratic form, since the quadratic form is completely charac¬ 
terized by its matrix. We may therefore say that the transforma¬ 
tion X = Px' changes the quadratic form x^Ax into x^Bx; and 
we shall employ this mode of expression whenever we find it con¬ 
venient to do so. 

Exercise 12.2.1. Show that the linear transformation x = Px' is 
orthogonal if and only if it transforms the unit quadratic form into itself. 

In preparation for the next theorem let us note that if A is real 
and symmetric and i?(A) = r, then, by Theorem 10.3.4 (p. 302) 
and the corollary to Theorem 10.2.3 (p. 296), the number of non¬ 
vanishing characteristic roots of A is r. 

Theorem 12.2.1. (Orthogonal reduction of quadratic forms) 

Let x^Ax be a real quadratic form of rank r. Then there exists an 
orthogonal transformation which transforms it into the diagonal form 

Ai 

where A^,..., A,, are the non-vanishing characteristic roots of A. 

This result is essentially a restatement of the theorem on the 
orthogonal reduction of real symmetric matrices (Theorem 10.3.4), 
For by virtue of that theorem there exists an orthogonal matrix P 
such thatf 

P^AP = P-iAP = dg(Ai,...,A„0,...,0), 

where the expression on the right-hand side contains w—r zeros. 
Hence the orthogonal transformation x == Px' carries the given 
quadratic form x^Ax into 

x'(P=f^AP)x' =x'^.dg(Ai,...,A„0,...,0).x' = 

The process of transforming a given quadratic form into a dia¬ 
gonal quadratic form is known as reduction to diagonal form. If the 
transformation is orthogonal, we speak of an orthogonal reduction. 

12.2.2. One of the most important applications of the trans¬ 
formation theory of quadratic forms occurs in the reduction of 
central quadrics (or conics) to principal axes. 


t It should be remembered that ‘orthogonal similarity transformation* and 
orthogonal congruence transformation* are identical notions. 



364 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, $ 12v2 

Theorem 12.2.2. Let the equation of a central quadric Q, mth 
its centre at the originy be 

where X 2 , x^ are non-homogeneous rectangular coordinates. Let 
^ 2 > ^3 characteristic roots of the real symmetric 3x3 matrix 
A = (a^^) and let 52> ?3 orthonormal characteristic 

vectors of A associated with A^, Ag, A 3 respectively.^ Then the directions 
specified by are the directions of a set of principal axes of Q\ 

and with respect to these axes as coordinate axes the equation of Q 

assumes the form Aia;f 4 -A 2 a;H-A 3 a;§ == 1. (12.2.1) 

Let P denote the (orthogonal) matrix having ?3 

columns. Then, by Theorem 10 . 2.1 (p. 293), 

P^AP = dg(Ai,A2,A3). (12.2.2) 

The original equation of Q is given by x^Ax = 1 , where 
X = {x^y X 2 , Xq)^. If S denotes the original system of coordinates and 
/S' the system for which specify the directions of the axes, 

then, by Theorem 8.4.4 (p. 238), x = Px', where x' == 
and Xiy X 2 , x^ are the coordinates, with respect to /S', of a point whose 
coordinates with respect to 8 are x^- Hence, with respect to 
/S', the equation of Q assumes the form 

(Px')^A(Px) = 1 , 

and in view of ( 12 . 2 . 2 ) this is equivalent to 

Aixi^+AjOTa^+Aga;,® = i. 

It follows that ?1, ?2.?3 specify the directions of a set of principal 
axes of Qy and the theorem is therefore proved. 

In examining the reduction of central quadrics to principal axes 
we must distinguish between three cases that may arise. 

Case I. (Ai, Ag, A 3 distinct.) 

In view of the corollary to Theorem 7.6.1 (p. 216), each of the 
three vectors § 1 , 52 » ?3 is, in this case, uniquely determined to 

t The existence of such a set of vectors is gucuranteed by Theorem 10.3.5 (p* 
304). 

t If ^ is the vector (p, r) then by ‘the direction specihed by 5* we mean, of 
course, the direction of the straight line joining the origin to the point (p, g, r). 



XII, §12.2 ORTHOGONAL REDUCTION TO DIAGONAL FORM 366 

within a scalar factor ^ 1 . Hence Q has a unique set of principal 
axes. This conclusion is confirmed by the equation ( 12 . 2 . 1 ) which 
shows that Q is an ellipsoid or a hyperboloid, but not a quadric of 
revolution. 

Case II. (Precisely two of A^, Ag, A 3 are equal.) 

To fix our ideas, let us assume that = A 3 . In that case 

is determined to within a factor ± 1 . Now ^3 constitute a 

basis of ® 3 , and therefore any vector tq may be expressed in the 

If Yj is orthogonal to 5^, then 

0 = = hdvii) = h> 

and therefore y) = 

AyJ = ^2 A52“1“^3 “ ^2^2 52"i"^3^2?3 ^ ^2^' 

Hence any non-zero vector yj orthogdlial to is necessarily a 
characteristic vector of A associated with Ag. Thus to choose 
^2 we may take any unit vector in the plane 
through 0 perpendicular to 5i- Finally ^3 
must be at right angles to both and and 
so, for any given and it is fixed to within 
a scalar factor ± 1 . 

One principal axis of Q (that corresponding 
to 5 i) is therefore uniquely determined; the 
remaining two may be chosen arbitrarily 
(subject to their being at right angles) in 
the plane through 0 perpendicular to the 
first axis. This is borne out by the equation 
( 12 . 2 . 1 ) from which we can infer that Q is an ellipsoid of revolution 
(but not a sphere) or a hyperboloid of revolution. 

Case III, (Ai, Ag, A 3 all equal.) 

By ( 12 . 2 . 2 ) we have in this case 

P^AP = dg(Ai,Ai,Ai) = AJ, 

and so A = A^I. Since (A^I)? = A^?, we see that every non-zero 
vector is a characteristic vector of A. It follows that any three 
mutually perpendicular lines through the origin may be taken as 
principal axes of Q, The equation ( 12 . 2 . 1 ) leads to the same con¬ 
clusion since it tells us that Q is a sphere. 




360 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.2 

To carry out the reduction to principal axes in a numerical case 
we have to effect an orthogonal reduction of a symmetric matrix, 
and we do this by using the procedure described on p. 303, 
Consider the central quadric Q given by the equation 

^y^+^z^—2yz—^zx—4:xy = 1. (12.2.3) 

The associated matrix is 

( 0 —2 — 2 \ 

-2 3 -1 , 

-2 -1 3/ 


and the characteristic roots of A are found to be —2,4,4. Hence, 
referred to a set of principal axes, the equation of Q assumes the 

—2a:'2+4t/'H4z'^ = 1, (12.2.4) 


and Q is therefore seen to be a hyperboloid of revolution of one 
sheet. Let us next determine the formulae of transformation. If 
(p, q, r)^ is a characteristic vector of A associated with the charac¬ 
teristic root A, then 


—2g—2r = Ap ' 
—2p+33—r = \q • 
—2p—5f+3r = Ar 


(12.2.5) 


When A = — 2 this system of equations gives us p = 2g, r = g, 
and therefore a unit characteristic vector associated with the 
characteristic root —2 may be taken as (2/V6, 1/V6, 1/V6)^. Again, 
when A = 4 the system (12.2.6) reduces to the single equation 
2p-fg-fr = 0. We have to choose two unit characteristic vectors, 
say (p,g,r)^ and (p',g',r')^, both of which are associated with the 
characteristic root 4 and the same time are orthogonal to each 
other. Hence 


2p-fg4-r = 0, 2p'+g'+r' = 0, pp'+jg'+n*' = 0. 

The two vectors in question can be chosen in an infinity of ways and 
one possible choice is (1/V3, 1/V3, —1/V3)^, (0, 1/V2, 1/V2)^.t 

The orthogonal matrix P whose columns are the three characteristic 
vectors just found is then given by 

/2/V6 1/V3 0 \ 

P==|l/V6 -1/V3 1/V2|, 

\l/V6 -1/V3 ~1/V2/ 

and we have P^AP = dg(—2,4,4). 

t The two vectors are, of cotirse, automatically orthogonal to the vector 
(2/V6, 1/V6, 1/V6)^ chosen earlier. 



XII, §12.2 ORTHOGONAL REDUCTION TO DIAGONAL FORM 367 


The substitution x = Px', where x = {x,y,zY, x' = (*',y',z')^, 
thus reduces the equation (12.2.3) to the form (12.2.4). This sub¬ 
stitution can be written, more explicitly, as 


2 , , 1 , 

a: = -7- a: 4—r- V , 

V6 ^V3^ 

1 , 1 , , 1 , 

1 , 1 , 1 , 


The three characteristic vectors specify the direction ratios of a set 
of principal axes. The equations of these axes are therefore given 

X — 2y — 2z\ —X — y = z; x — 0, y = -—z. 

12.2.3. The relation between the principal axes of a quadric Q, 
given by the equation x^Ax = 1, and the characteristic vectors 
of A was established above by means of the transformation theory 
of quadratic forms (i.e. of symmetric matrices.) A more direct 
derivation proceeds as follows. If x is the position vector of a point 
P, then the direction ratios of the normal to the polar plane tt of P 
with respect to Q are given by the components of the vector Ax. 
Hence the line OP is perpendicular to tt if and only if Ax = Ax for 
some (non-zero) value of A. But OP is perpendicular to tt if and only 
if P lies on one of the principal axes of Q. Thus each principal axis 
of Q is specified by a characteristic vector of A. 


12.3. General reduction to diagonal form 

In the preceding section we studied the reduction of quadratic 
forms to diagonal form by means of orthogonal transformations. 
In the geometrical context the insistence on orthogonality is 
necessary only when we investigate metrical properties of figures 
and it loses its significance in affine and projective geometry. 
We propose, therefore, to consider next reductions to diagonal 
form effected by non-singular linear transformations which are 
not subject to further restrictions. An analogous problem arises 
for bilinear forms and we shall deal with it before turning again to 
quadratic forms. 

12.3.1 • A bilinear operator can be represented in many different 
ways by a bilinear form, and we are therefore confronted by the 
problem of determining representations which shall be as simple 



368 BILIKKAB, QUADRATIC, AND HERMITIAN FORMS XU, § 12.3 

as possible. The next theorem shows that a ‘diagonal representa¬ 
tion’ can always be found. 

Theorem 12.3.1. Any bilinear form of rank r can be changed into 
the biUnear form x,y^+...+x,y, (12.3.1) 

by means of a non-singular linear transformation. 

This result states, in fact, that if ^ is a bilinear operator, of rank r, 
on the linear manifolds 501 and % then bases in 501 and 5R can be 
found with respect to which ^ is represented by the bilinear form 
(12.3.1). 

To prove the theorem denote the given bilinear form by x^Ay, 
where A is an m x n matrix and R{A) = r. Then, by Theorem 6.2.3 
(p. 176), there exist non-singular matrices P, 0 (of order m,n 
respectively) such that 

P^AO = 

where is the mxn matrix 



Hence the non-singular linear transformation x = P?, y = OiQ 
changes the bilinear form x^Ay into 

5^(P^AO)yj = 

where § = ^ This proves the theorem. 

We may, at the same time, note that if the original bilinear form is 
real, then it can be reduced to (12.3.1) by a real transformation. 

Definition 12.3.1. Two bilinear forms are equivalent if one of 
them can be transformed into the other by a non-singular linear 
transformation. 

Thus equivalence of bilinear forms means equivalence with 
respect to the group of non-singular linear transformations. 

Exebgise 12.3.1. Show that the relation between bilinear forms specified 
in Definition 12.3.1 is an equivalence relation in the sense of Definition 
6.6.1 (p. 186). 

Theorem 12.3.2. Each of the following three conditions relating 
to the bilinear forms <f>, if/ implies the other two, 

(i) and ifs are equivalent, 

(ii) ^ and ij/ represent the same bilinear operator. 

(iii) ^ and ^ have the same rank. 



XII, § 12.3 GENERAL REDUCTION TO DIAGONAL FORM 369 


Let the matrices associated with 0 be denoted by A, B 
respectively. Statement (i) then means simply that there exist 
non-singular matrices P, 0 such that 

B = P^’AO, (12.3.2) 

and, by Theorem 12.1.3 (p. 355), this implies and is implied by (ii). 
Again, by Theorem 6.2.3, (12.3.2) implies and is implied by (iii). 
The assertion is therefore proved. 

Exercise 12.3.2. Show that two bilinear forms are equivalent if and 
only if their matrices are equivalent in the sense of Definition 6.2.2 (p. 176). 

12.3.2. Turning now to quadratic forms we observe at once the 
following result. 

Theorem 12.3.3. A real qiuidratic form of rank r can be trans- 
formed by a real non-singular linear transformation into the diagonal 

where a^,..., are all non-zero. 


This is, of course, a weakened version of Theorem 12.2.1. Alter¬ 
natively it follows immediately by Exercise 6.4.3 (p. 185) since 
non-singular linear transformations of quadratic forms correspond 
to real congruence transformations of the associated matrix. 

Theorem 12.3.3 shows, in fact, that every quadratic operator on 
a linear manifold 501 possesses a ‘diagonal representation’ for an 
appropriate choice of basis in 501. The theorem is also important 
in projective geometry since it shows that, with respect to a suitable 
coordinate system, the equation of a conic or a quadric assumes a 
form involving only the squares of the coordinates. 

Exercise 12.3.3. Show that the transformation in Theorem 12.3.3 can 
be chosen in such a way that each 1^®'® ^1^® value il. 

Exercise 12.3.4. Show that Theorem 12,3.3 remains valid for complex 
quadratic forms, provided that complex transformations are admitted. By 
using the additional transformation 


^ ^ r VkHo^k = l»-» r), 

* Xvk (A; = r-fl.n), 

show also that in this case all a’s can be made equal to 1. 

As a consequence of Theorem 12.3.3 we have the following 
corollary. 

Corollary. An n-ary real singular quadratic form can be reduced 
by a real non-singular linear transformation to the form 

Bb 


5582 



370 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.3 

In view of Theorem 12.3.3 it may be asked whether it is possible 
to reduce a quadratic form of rank r to diagonal form in which the 
number of non-vanishing coefficients is not equal to r. It is, 
however, almost obvious that such a reduction is not possible. 

Theorem 12.3.4. If a quadratic form of rank r is reduced by a 
non-singular linear transformation to diagonal form, then the latter 
must have ^precisely r non-vanishing coefficients. 

For, by Theorem 12.1.6 (iii) (p. 360), the rank of a quadratic 
form is invariant under non-singular linear transformations, and 
the rank of a diagonal quadratic form is equal to the number of its 
non-vanishing coefficients. 

The theory of linear transformations of quadratic forms may be 
stated in a language slightly different from that used so far. Con¬ 
sider, for instance, the binary quadratic form 

ax^-\-2bxy-\-cy^, (12.3.3) 

and suppose that a change of variables, specified by the equations 

x' = ax+fy, y' = yx-\-hy, 

transforms it into a^x'^-\-2b'x^y'-\-c'y'^. (12.3.4) 

Instead of regarding the transition from (12.3.3) to (12.3.4) as a 
result of the transformation of variables, we can suppress all refer¬ 
ence to the introduction of new variables x\ y' and exhibit the 
relation between the original and the resulting quadratic form as 
an identity, namely, 

ax^-^2bxy-\-cy^ = a\oix-\-^yY-\-2b\(xx-\-^y)(yx-\-hy)-\-c\yx-\-hyY. 

By adopting, as we obviously can, the same point of view in the 
case of n-ary quadratic forms we are led to the following conclusion. 

Theorem 12.3.5. A quadratic form in x^,..., x^, of rank r, can he 
expressed as a linear combination, with non-zero coefficients, of the 
squares of r linearly independent linear farms in x^,...,x^. 

This result is implied by Theorem 12.3.3. Let 

= x^Ax, 

where x == {Xi,...,Xn)^, be the given quadratic form. Then a 
suitable non-singular linear transformation x = Py changes it to 
0tiyi+-+0Lryr where y ^ (yv-yyn)^- We may 



XII, §12.3 GENERAL BEDUCTION TO DIAGONAL FORM 371 

think of each as a linear form in determined by the rela¬ 

tion y = P“^x, and we then have, identically in Xi,...,x^, 

Moreover, since |P“^| ^ 0, it follows by Theorem 5.5.2 (p. 153) 
that not only but, indeed, are linearly independent 

linear forms. 

Exercise 12.3.5. Express -\-2xy as a linear combination of 
two squares. 

12.3.3. Theorems 12.3.3 and 12.3.5 are pure existence theorems 
and do not provide a procedure for carrying out the reduction of a 
given quadratic form to diagonal form. Such a reduction can, of 
course, be always effected by an orthogonal transformation.^ 
However, the amount of computation involved in this process 
is often prohibitive; and when orthogonality is not essential, 
Lagrange’s method of reduction, explained below, is preferable. 
In addition to its practical utihty this method (which was devised 
by Lagrange in 1759 and rediscovered by Gauss in 1823) provides, 
in conjunction with Theorem 12.3.4, a new proof of Theorems 
12.3.3 and 12.3.5, and so makes them independent of the relatively 
difficult Theorems 6.4.1 and 10.3.4. 

Lagrange’s method is a process of successive reduction by means 
of which we eventually remove all mixed terms Xg (r ^ s) from 
the initial quadratic form 

n 

r,s = l 

= "u *1 + - +0„„ 2 ai 2 aJi * 2 + • • • + 2an-l.n *n-l ®n- 

If at least one of is not zero we may assume, without loss 

of generality, that ^ O.J Then 

i> = allXl+2al2XlX2+...+2al^x^XJ^+ 2 a^x^x^ 

r,s=2 

=^ 0 'iiUl+ 2 ^XiXi+...+ 2 ^XiX^+^i{x 2 ,...,x„), say. 

\ “ii On / 

1 example in § 12.2.2. 

t if = 0 and a„ 0 for some r > 1, we may first apply the non- 
smgular Imear transformatk)n 

~ ^ If s ^ r). 



372 BILINBAB, QUADRATIC, AND HERMITIAN FORMS XII, §12.3 
Hence 



say, where is a quadratic form in x„. We now see that the 
(obviously non-singular) linear transformation 


= y2 = «2. 2^» = *n 

changes ^ into the form 

^ (12.3.6) 

(where actually otj = ®ii)- 

However, the above procedure breaks down if = ... = a^n == 0 , 
and in that case an additional step is required. We still have 5 ?^ 0 
for some r, s such that r ^ s, for otherwise the quadratic form would 
vanish identically. Assume, then, without loss of generality, that 
ai 2 = 7 ^ 0 and use the transformation 

This transformation has determinant —2 and so is non-singular. 
It carries <f> into a quadratic form in fi,..., in which the term in 

is present; to this new quadratic form we now apply the method 
described earlier and obtain <f> in the form (12.3.6). 

It is thus in every case possible to change the given quadratic 
form <f> into (12.3.5) by a non-singular linear transformation. The 
reduction to diagonal form may now be completed by repeating 
the process as often as it is necessary. Treating ^2 same way 
as we treated ^ above, we observe that there exists a real non¬ 
singular transformation of y2,—,yn 

<f>2 = 0^2^i + ^3(*8>**'>^w)> 

where ^3 is a quadratic form in 2 : 3 ,..., z^. If the equations connecting 
2J2,...,2^n Supplemented by the equation y^ = 
then the transformation from yi,...,yn again non¬ 

singular; and <f> is seen to assume the form 

<f> = oc2^25f-f"a2 251 +^ 3 ( 253 ,..., 2 ^), 



XII, §12.3 GENERAL REDUCTION TO DIAGONAL FORM 373 
the change being effected by a non-singular transformation from 

We continue this process of reduction as long as any mixed terms 
are left, and ultimately obtain ^ in the form 

^ = oL^wl+...+cx^wly 

where r = The reduction has been effected by a succession of 

real non-singular linear transformations whose resultant is there¬ 
fore again a real non-singular linear transformation. It is clear, 
moreover, that the procedure applies equally well to complex 
quadratic forms. We conclude, therefore, that any quadratic 
form, real or complex, can be reduced to diagonal form by means 
of a non-singular linear transformation. 

To illustrate Lagrange’s method of reduction in a numerical 
case, let us consider the quaternary quadratic form 

(f) = 2 X^X 2 — 

Putting 

= ^z = y\-y2> ^3 = ys> ^i = yi> (12.3.6) 

we obtain 

^ = ^yl-^yl-^yiyz+^yiyi-^ysVi 

= 2(yi-iy3+i2/4)‘‘- 2yl-iyl-iyl-ya y^ 

Next, putting 

2i = yi—kyz+\yA^ ^2 = 2^2. 23 = ys. ««== y^ (12.3.7) 

we obtain 


^ = 2zl—2zl—\zl—\z\—z^Zi 
Hence, writing 

W2 = Zj, Wg = Z34-Z4, 


2zf-2zi-i(Z3+Z4)* 


w, = z 


1» 


w. = z. 


(12.3.8) 

we have ^ = 2v^—2v^—\v^. (12.3.9) 

Moreover, we see by (12.3.6), (12.3.7), and (12.3.8) that the trans¬ 
formation effecting the reduction of <(> to the diagonal form (12.3.9) 

IS given by ^ 

*2 = Wi—W2+iM’3—“’4» 


Xa = 


Wg —W 4 , 




w^. 



374 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.3 
Alternatively, we may write 

Wi = 

== x^+ x^ 

and (12.3.9) may be expressed in the form of the identity 

2X2 X2 x^ X^ X2 X 2 ~\~X 2 x^ — ^x^ x^ 

= ^a;3+^a;4)2—|a:2)^“~i(^3+^4)^* 

We recognize, incidentally, that It(<f>) — 3. 

12.3.4. The next theorem shows in what circumstances a given 
quadratic form can be expressed as a product of two linear forms. 

Theorem 12.3.6. Let ^ he a complex quadratic form, (i) ^ is 
the square of a non-vanishing linear form if and only if B{(l>) = 1. 
(ii) (f) is the product of two linearly independent linear forms if and 
only if R(^) = 2. 

If ^ does not vanish 

identically we may assume, without loss of generality, that ^ 0. 
The non-singular linear transformation 

Vl ~ ^ 2 > •”> Vn ^ 

carries^ into i/f. Hence JS(^) = 1. On the other hand, if i?(^) == 1, 
then, by virtue of Theorem 12.3.5 (as applied to complex quadratic 
forms), it is possible to write in the form (j) = where oc^ ^ 0 
and y^ is a (non-vanishing) linear form in a;i,...,a;^. Thus 

4> = {^°^l-{!PlXl+-+PnXn)Y> 

and (i) is proved. 

Next, let , 

4 > = (“l*l+-+an*n)(^l»l+...+i3„«„), 

where the two linear forms on the right-hand side are linearly 
independent, i.e. are not multiples of each other. We may then 
assume that 

«1 «2 
Pi 


^0, 



XII, §12.3 GENERAL REDUCTION TO DIAGONAL FORM 375 

since at least one such minor is not zero. The non-singular linear 
transformation 

2/2 = Pl^l+P2-^2+-+Pn^n 

ys = ^3 


Vn = 

carries (f> into 2/1 y 2 > ^he non-singular linear transformation 
2/1 = Z 1 +Z 2 , 2/2 = ^ 1 - 212 , t/s = Zs, ..., 2/n == 
carries 2 / 12/2 2 ^ 1 —Hence R{(j>) — 2. On the other hand, 

if = 2, then, by Theorem 12.3.5, (f> can be written as 

(j, = «! 2/1 + 0^2 2/2 (^1 ^ «2 7 *^ ^)> where 2 /i» 2/2 linearly inde¬ 
pendent linear forms in Hence 

<f> = (Vai2/i+iVa2 2/2)(Vaiyi— 

and the expressions in brackets are again linearly independent 
linear forms in x^,.,,,x^. The proof is therefore complete. 

Corollary. A complex ternary quadratic form is the product of 
two linear factors if and only if it is singular. 

This corollary provides us with a criterion for the degeneracy of 
conics. If in homogeneous (cartesian, affine, or projective) 
coordinates x, y, z the equation of the conic C is of the form 
x^Ax = 0, where x = {x,y,z)'^, then C is degenerate (i.e. breaks 
up into two distinct lines or a repeated line) if and only if the 
determinant |A| vanishes. 

12.4. The problem of equivalence. Rank and signature 

12.4.1. Consider a system 8{x, y, z) of projective coordinates in 
the plane. With respect to this system any conic C will have an 
equation of the form 

<l>(x,y,z) = 0, (12.4.1) 

where ^ is a ternary quadratic form. If S\x\y\z') is a second 
system of projective coordinates, then the two systems are con¬ 
nected by a relation (x\y\z')'^ = \{x,y,z)'^y where A is a non¬ 
singular 3x3 matrix. This linear transformation changes the 
equation (12.4.1) of C into 

^{x\y\z') = 0 , 

where ifj is again a ternary quadratic form. The new equation still 




376 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.4 

represents the same conic; and consequently, from the point of 
view of geometrical interpretation, what is important is not an 
individual quadratic form but the entire class of quadratic forms 
that can be obtained from ^ by non-singular linear transformations. 
All these quadratic forms will represent, when equated to 0, the 
same conic (7, referred in each C6ise to a suitable system of projective 
coordinates. 

It is principally for this reason that we are interested in quadratic 
forms which can be obtained from each other by non-singular linear 
transformations. Accordingly we introduce the following definition 
which is, in point of fact, implicit in our previous terminology. 

Definition 12.4.1. If a quadratic form 0 is obtained from the 
quadratic form <l> by a complex {real) non-singular linear transforma¬ 
tion, then 0 is said to be equivalent to <f> under the group of complex 
{real) non-singular linear transformations. 

The two groups of transformations mentioned here will be 
denoted, for brevity, by Tq and Ijj. The two relations between </> 
and t/f are clearly equivalence relations. We can express these 
relations in terms of the matrices A, B of ^ respectively by 
saying that there exists a non-singular matrix P such that 
B = P^AP. Here P is complex in the case of equivalence with 
respect to Tq and real in the case of equivalence with respect to Tj^. 

Our problem is to determine all equivalence classes defined by the 
groups Tq and and for Tq the solution of this problem is very 
simple indeed. 

Theorem 12.4.1. Two complex quadratic forms are equivalent 
with respect to the group of complex non-singular linear transforma¬ 
tions if and only if they have the same rank.'^ 

If the two quadratic forms are equivalent (in the sense stated), 
then their ranks are obviously equal. Consider next two quadratic 
forms <l>, iff, both of rank r. In view of Exercise 12.3.4 (p. 369) both 
these forms are equivalent to icf-f ...-fxj; hence they are equi¬ 
valent to each other. 

Exercise 12.4.1. Show that an n-ary quadratic form is non-singular if 
and only if it is equivalent, under the group Tc. to the n-ary unit form. 

12.4.2. Theorem 12.4.1 settles the problem of equivalence with 
respect to since it shows that each equivalence class consists 


t Cf. Theorem 12.3.2 (p. 368). 



xn,§12.4 EQUIVALENCE. BANK, AND SIGNATURE 377 

simply of aU n-ary quadratic forms having the same rank. For 
real transformations the problem is more difficult and more 
interesting. The basis of our discussion is a result found by 
Sylvester in 1862. 

Theorem 12.4.2. (Sylvester’s law of inertia) 

If areal qvadraticform in of rank r, is reduced by two real 

non-singular linear transformations to the diagonal forms 

«i2/i+-+a,2/?, (12.4.2) 

(12.4.3) 

respectively, then the number of positive as is equal to the number of 
positive p's.f 

Let the numbers of positive a’s and positive jS’s be denoted by 
s, t respectively. The y ’s and 2 j’s may be assumed to be so numbered 
that the positive a’s and positive j3’s come first. Since the forms 
(12.4.2) and (12.4.3) are obtained by transformations of the same 
initial quadratic form, we have 

= (12.4.4) 

This relation is an identity in since the j/’s and are 

linear forms in the cc’s. 

Suppose that s > t. Then the system of n—s-{-t < n linear 
homogeneous equations 

y.+i = 0. Vn = 0, Zi = 0, .... Zt = 0 (12.4.5) 

in the unknowns possesses a solution 

== $ 1 , = In. (12.4.6) 

in which are not all zero. 

Denote by y*, z% {k = l,..., 7 ir) the values assumed by the linear 
forms Zj^ respectively when the substitution (12.4.6) has been 
made. Then, by (12.4.4) and (12.4.5), 

and therefore 

= = ... = = 0. (12.4.7) 

Hence, by (12.4.6) and (12.4.7), 

= = = = = = 


t In view of Theorem 12.3.4 (p. 370) all a’s and jS’s are non-zero. 



378 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.4 

and so the system of n linear homogeneous equations = 0 ,..,, 

= 0 in the unknowns possesses a non-trivial solution 

(12.4.6). Hence the determinant of the linear forms 
vanishes, and this is incompatible with our hypothesis that the 
original quadratic form is transformed into (12.4.2) by a non- 
singular linear transformation. Thus the assumption s > t leads 
to a contradiction; and by symmetry so does the assumption s < t. 
Hence s — t, and the theorem is proved. It follows at once, of 
course, that the number of negative a’s is equal to the number of 
negative jS’s. 

It is convenient to introduce the following definition. 

Definition 12.4.2. If a qvxidratic form ^ is transformed by a real 
non-singular linear transformation into a diagonal quadratic form 
<f>i, then <f)i is said to be a canonical form of <f>. 

Exercise 12.4.2. Show that two real quadratic forms are equivalent 
with respect to Tj^ if and only if they possess a common canonical form. 

Sylvester’s law of inertia asserts that if <f >2 are any two 
canonical forms of a quadratic form then the number of positive 
coefficients in is equal to the number of positive coefficients in cjlg, 
and similarly for negative coefficients. We are therefore entitled 
to introduce the following notation. 

Definition 12.4.3. The number of positive and negative co¬ 
efficients in any canonical form of the quadratic form x^Ax will be 
denoted by P = P(A) and N = iV^(A) respectively. 

The numbers P and N so defined can easily be related to the 
characteristic roots of A. 

Theorem 12.4.3. If A is a real symmetric matrix, then P(A) is 
equal to the number of positive and N(A) to the number of negative 
characteristic roots of A. 

If jB(A) = r and A^,..., are the non-zero characteristic roots of 
A, then, by the theorem on orthogonal reduction of quadratic 
forms (p. 363), we know that x^Ax possesses the canonical form 
^ 1 ^ 1 +—+\^r- The assertion therefore follows. 

Definition 12.4.4. The number s = P(A)—JV(A) is called the 
SIGNATURE of the quadratic form x^Ax (or of the symmetric matrix 

A). 



379 


xn, § 12.4 EQUIVALENCE, BANK. AND SIGNATURE 

Since P+N = r, P—N = s, any two of r, s, P, N determine the 
other two. The evaluation of r, s, P, N is carried out particularly 
easily by the use of Lagrange’s method of reduction. 

Exebcise 12.4.3. Determine the values of P, N, r, s for the quaternary 
quadratic form 2 xiX 2 —XiX^-\-XiX^—X 2 X^-\-X 2 X^ — 2 x 3 X^ considered on p. 
373. 


We are now able to formulate the solution of our principal 
problem. 

Theorem 12.4.4. (Equivalence theorem for quadratic forms) 

Two real quadratic forms are equivalent with respect to the group 
of real non-singular linear transformations if and only if they have 
the same rank and the same signature. 


This result should be compared with Theorem 12.4.1 where a 
less restrictive condition (i.e. equality of rank) suffices to charac¬ 
terize a less restrictive definition of equivalence (i.e. equivalence 
with respect to Tq). 

To prove Theorem 12.4.4 we first observe that if two quadratic 
forms are equivalent (with respect to ?j^), then, by Exercise 12.4.2, 
they possess a common canonical form and so have the same rank 
and the same signature. 

Next, let x^Ax be a quadratic form of rank r, and write P( A) = P. 
We know that x^Ax is equivalent to 

0^1 — ••• (12.4.8) 


where all a’s are positive. The additional non-singular linear trans¬ 
formation 

carries (12.4.8) into 

2/i+—+2/p“2/p+i~-*'“2/r> (12.4.9) 


Xvk (* = r+l,...,w) 


and x^Ax is therefore equivalent to (12.4.9). If now a second 
quadratic form x^Bx has the same rank and the same signature as 
x^Ax, then P(B) = P(A) and sox^Bx is also equivalent to (12.4.9). 
Hence x^Ax and x^Bx are equivalent to each other, and the proof 
is complete. Thus each equivalence class with respect to the group 
Tn consists of all 7 i-ary quadratic forms having the same rank and 
the same signature. 

Exercise 12.4.4. Show that there are altogether n +1 equivalence classes 
with respect to and i(n-f l)(n-l-2) equivalence classes with respect to Ijj. 



380 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.4 

Exebcise 12.4.5. Show that an n-ary quadratic form is equivalent (with 
respect to to the n-ary unit form if and only if its rank and its signature 
are both equal to n. 

12.4.3. In conclusion we mention, without proof, two further results 
in which rank and signature play a part. Let 

== 0 (12.4.10) 

be an equation with complex coefficients, and let denote the sum of the 
rth powers of its roots. Then the rank of the (symmetric) matrix 



/^2n-2 

^2n-3 


A = 

l^2n-a 

«2n~4 

• • «n-2 j 



fin-2 

. . 8o / 


is equal to the number of distinct roots of (12.4.10). This implies, in particular, 
that (12.4.10) has at least two coincident roots if and only if A is singular—a 
result with which we are already familiar from § 1.4.1. If the coefficients of 
(12.4.10) are real, then A is real and symmetric and its signature is equal 
to the number of distinct real roots of (12.4.10).t 

12.5. Classification of quadrics 

The equivalence theory of quadratic forms plays an important 
part in geometry and enables us, in particular, to classify quadrics 
with respect to various groups of transformations. We shall briefly 
discuss some of the principal modes of classification. 

(i) Projective classification 
Let the equation of a quadric be written as 

x^Ax = 0, (12.5.1) 

where A is a symmetric 4x4 matrix, x = x^, x^'^, and 

^3 projective coordinates. Two quadrics are said to 
be projectively equivalent if the equation of one of them can be 
transformed into that of the other by a {complex) projective collinea- 
tion^ i.e. by a transformation 

x' = Px, (12.5.2) 

where P is a non-singular complex 4x4 matrix. Thus projective 
equivalence of quadrics means equivalence of the associated 
quadratic forms with respect to the group Tq. Two quadrics are 
therefore projectively equivalent if and only if the associated 
quadratic forms have the same rank. Accordingly there are just 


t For a proof of these and related results see Perron, 12, ii, 2-5. 




XII, § 12.5 CLASSIFICATION OF QUADBICS 381 

four equivalence classes; and these are exhibited in the following 
table, where r denotes the rank of A. 



(ii) Complex affine classification 

Suppose that (12.5.1) is again the equation of a quadric, but that 
now Xq, Xi, ccg, x^ denote homogeneous affine coordinates, with 
= 0 as the equation of the plane at infinity. Write, moreover, 


«01 

^02 

^03 \ 


a^ 

«u 

%2 

®13 1 

1, A' = lagi 

' \«31 

a< 

®21 

®31 

^22 

®32 

®23 j 
%3/ 

a.. 


so that A' is a matrix specifying the conic of intersection of the 
quadric (12.5.1) and the plane at infinity. A complex affine collinea- 
tion is a complex non-singular linear transformation of coordinates 
which transforms the plane at infinity into itself, i.e. a non-singular 
transformation of the type 


Xq - Pqq Xq 

x[ = PiqXq+PiiX^+Pi2^2+Pi3^3 
X2 = P20 ^0+P21 ^1+^22 ^2+i^23 ^3 
= 2^30 ^0+i^31 ^1+^32^2+1^33 ^3 > 

In matrix form this may be written as (12.5.2), where 


(12.5.3) 


/Poo 

0 

0 


iPio 

Pll 

P 12 

Piz 1 

1 P20 

P 2 I 

P 22 

Pza 1 

\P30 

P31 

PZ2 

Pza/ 


is a complex non-singular matrix. We shall discuss equivalence 
with respect to the group of complex affine collineations, and shall 
determine all equivalence classes and the corresponding standard 
forms of the equations of quadrics. 

Write B(A) == r, i?(A') = />. We know, by virtue of Exercise 
12.3.4 (p. 369), that there exists a non-singular transformation 

^0 ^ ” Cjfcl^l+^A;2^2+^A:3^3 (^ = 1>2, 3) 




382 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.6 
which reduces the quadratic form x^Ax to 

®oo^o + 26oi ®1+26 o2 »0 «2+26o3 *0 *3+*!+ -+x% 
and this quadratic form may be written as 

(®oo ^01 ••• 

+26q3 Xq (^1+^01 ^o)^+ •••+(^p+^op ^o)^- 

Applying the transformation 


4 = 


'^k+bok^o 


(k — 

{k — 0, 


3) 


and writing aoo'~^oi“~-**“"^op — we obtain the quadratic form 
6a:g+26o,p+ia;oa:p+j+...+26o3a;oa;8+a;f+...+a:2. (12.5.4) 

This last expression can be simplified still further; and we need to 
consider three cases, each of which gives rise to a distinct type of 
quadratic form. 

(i) When 6o,p+i = ••• = 603 ^ 6 = 0 (or when p = 3 

and 6 = 0) (12.5.4) becomes 

!4+...+xj. 

(ii) When = ... = 633 == 0 (or p = 3) and b ^ 0, then 
(12.5.4) can be transformed into 

zl+oi^+...+xl. 

(iii) When 60 ^+ 1 ,..., 633 are not all zero we may assume, without 
loss of generality, that 6 q p+i 0 . Putting 

4+1 = *^o+ 26 o,p+i^p+i+-+ 2603^*^3 
and leaving the other a;’s unchanged in (12.5.4), we obtain the 

qu»Jiati«form 

Since all transformations concerned are complex affine collinea- 
tions, it follows that the original quadratic formx^Ax is equivalent, 
with respect to the group of these collineations, to one or the other 
of the forms xf +...+4 ( 12 . 6 . 5 ) 

*o+*i+-+*p. (12.5.6) 

*oVi+®t+-+®p- (12.6.7) 

Moreover, no two of these quadratic forms are equivalent to each 
other, for r is invariant under collineations and we have r = p, 
r = p-\-l, r = p-j-2 for the forms (12.6.6), (12.6.6), (12.5.7) respec¬ 
tively. It follows that two quadrics = 0 and x^Bx = 0 are 
equivalent with respect to the group of complex affine collineations 



XII, § 12.6 CLASSIFICATION OF QUADRICS 383 

if and only if the values of r and p are the same for A as for B. 
The equivalence classes are thus completely specified by the values 
of r and p, and each class has a representative of the form (12.5.5), 
(12.5.6), or (12.5,7). These classes are exhibited in the following 
table. 


r 

P 

Standard equation 

Type of quadric 

4 

3 

= 0 

Central quadric 

4 

2 

xl+xl-\-XoX^ = 0 

Paraboloid 

3 

3 


Quadric cone 

3 

2 

xl-\-x\-\-xl = 0 

Central cylinder 

3 

1 

xl+x^x^ = 0 

Parabolic cylinder 

2 

2 

= 0 

Pair of intersecting planes 

2 

1 


Pair of parallel planes 

2 

0 

x„Xi = 0 

A plane together with the plane at infinity 

1 

1 

K'f = 0 

Repeated plane 

1 

0 

*0 = 0 

Plane at infinity repeated 


(iii) Real affine classification 

So far we have been considering complex quadrics, i.e. quadrics 
specified by complex matrices. We shall now confine our attention 
to real quadrics. We shall continue to write the equation of a 
quadric in the form (12.5.1), where A is a real 4x4 matrix and 
^^ 0 * ^3 denote homogeneous affine coordinates, with == 0 

again as the equation of the plane at infinity. A real affine collinea- 
tion is a real non-singular linear transformation of the type (12.5.3). 
Our object is to find all equivalence classes with respect to the group 
of real affine collineations. 

Projective classification and complex affine classification depend 
on complex transformations, and considerations of rank are 
sufficient for determining all equivalence classes. In the case of 
real affine classification, however, account must be taken of the 
signs of the various terms which appear in the equations of the 
quadrics; and use must be made, therefore, of the notion of signa¬ 
ture. We shall denote by s, a the moduli of the signatures of A, A' 
respectively. The symbols r, p will have the same significance as 
before. 

Arguing as in (ii), we see that any quadratic form x^Ax can be 
transformed, by a real affine collineation, into a quadratic form of 
one of the following t 3 q)es: 

€iXl+...+€pXl\ 

eoXl+eixl+^.+epT^] 

1-1 */> 





384 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.6 

Here each € has the value 1 or — 1. We do not need to consider all 
possible combinations of values of the c’s, since our main concern 
is with the equation x^Ax = 0 rather than with the quadratic 
form x^Ax, and any choice of values of €o,€i,...,€p leads to the 
same equation as the choice — cq, —ei,..., Bearing this in 
mind, we can write out the table of standard equations. In this 
table we note down in each case the values of r, p, 8, and cr. 


r 

- 

a 

a 

Standard eqtiation 

Type of quadric 

4 


4 

3 

®o+*i+a:2+®8 = 0 

Virtual quadric 

4 


2 

3 

= 0 

Ellipsoid 

4 


2 

1 

xl-{-xl-{-xl—xl = 0 

Hyperboloid of two sheets 

4 


0 

1 

—xl+xl-\rxl-xl = 0 

Hyperboloid of one sheet 

4 


2 

2 

xI-^x^+XqX^ — 0 

Elliptic paraboloid 

4 


Q 

Q 

x^—xl-\-XoX^ = 0 

Hyperbolic paraboloid 

3 


3 

3 

i4-^xl-{-xl == 0 

Virtual quadric with a single real point 

3 


1 

1 

x\-\'x\—x\ — 0 

Quadric cone 

3 


3 

2 

®0+®l+®2 = 0 

Virtual cylinder 

3 


1 

2 

-xl-\-x\-^x\ = 0 

Elliptic cylinder 

3 

2 

1 

y 

xl-]rxl-xl = 0 

Hyperbolic cylinder 

3 

D 

1 

u 

xl-^XoX^ = 0 

Parabolic cylinder 

2 

y 

2 

2 

xli-xl “ 0 

Pair of virtual planes intersecting in a 


■ 




real line 

2 

y 

0 

0 

xl—xl = 0 

Pair of intersecting planes 

2 

n 


B 

xl^x\ = 0 

Pair of parallel virtual planes 

2 

1 


P 

—ajJ+a;? = 0 

Pair of parallel planes 

2 

0 


y 

x^x^ = 0 

A plane together with the plane at infinity 

1 

1 


R 

a;J = 0 

Repeated plane 

1 

y 


□ 

a:J = 0 

Plane at infinity repeated 


It is clear that the numbers r, p, s, a are invariant under the 
group of real affine collineations. The above ^ble therefore shows 
that no two quadrics having different standard equations are 
equivalent with respect to this group. Thus each equivalence class 
is specified by the set of values of r, p, 5 , cr; and the table gives one 
representative for each class. 

The different types of equivalence we have discussed give rise 
to a hierarchy of classifications. Thus all proper quadrics are 
projectively equivalent, but with respect to the group of complex 
affine collineations they separate into central quadrics and para- 
l^loids. Again, with respect to the group of real affine collineations, 
central quadrics separate into ellipsoids, hyperboloids of one sheet, 
and hyperboloids of two sheets, while paraboloids separate into 
those of the elliptic and those of the hyperbolic type. 













XII, §12.6 CLASSIFICATION OF QUADRICS 385 

Exebcise 12.5.1. Discuss the projective and affine classification of conics. 
Exebcise 12.5.2. Extend to n-ary quadratic forms the reduction of 
quaternary forms carried out in (ii) and (iii). 

(iv) Metric classification 

By specializing further the group of transformations, we are led 
to refinements of the previous systems of classification. By far 
the most important system that arises in this way is that of ‘metric 
classification’. 

Let (12.5.1) again be the equation of the quadric, with Xq, ojg, 

now interpreted as homogeneous rectangular coordinates, and with 
= 0 as the equation of the plane at infinity. A euclidean collinea- 
tion (i.e. a combination of translations, rotations, and reflections) 
can be shown to be a transformation of the t 3 rpe 

x'o = Xq, 4 = (* == 1.2, 3), 

jPn ft. ft.\ 

[P 21 P22 P23] 

\PZ1 PZ2 PJ 

is an orthogonal matrix. Two quadrics arc said to be metrically 
equivalent if they are equivalent with respect to the group of 
euclidean collineations. The number of equivalence classes is, of 
course, infinite, and the individual classes cannot now be specified 
by means of ‘arithmetic’ invariants, such as rank or signature. 
The problem of classification involves the study of ‘algebraic’ 
invariants, which are polynomials in the coefficients a^^ of the 
quadric Ax = 0. Thus \A\, |A'|, aii+a 22 +» 33 > and 

®22 + ^11 + ^11 ®22—^23 ^32 ®31 ^13 ^12 ^21 

are all invariant under the group of euclidean collineations. The 
systematic discussion of such invariants falls, however, outside the 
scope of the present treatment.f 


12.6. Hermitian forms 

12.6.1. The reason why in the preceding sections we confined 
ourselves largely to the discussion of real quadratic forms is that 
the most significant generalization of a real quadratic form is not 
a complex quadratic form but a ‘hermitian form’. The theory of 
hermitian forms (initiated by Hermite in 1854) closely resembles 

t For further information see Sommerville, Analytical Oeometry of Three 
Dimensions, 171-3 and 321-2. 


5682 



386 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.6 

that of quadratic forms and we shall therefore deal with it rather 
summarily. 

Definition 12.6.1. An hbbmitian form (f> in the (complex¬ 
valued) variables Xi,..,,x„ is a function of the type 

<f> = <f>(xi,...,xj = 2 a„x^x, = x^Ax, 

r,«=l 

where A = (a^g) is a hermitian matrix and x = 

There is therefore a biunique correspondence between n-ary 
hermitian forms and nxn hermitian matrices. 

The value which a hermitian form assumes for any values of its 
variables is necessarily real, for 

(x^Ax) = x^Ax = (x^Ax)^ = x^A^x = x^Ax. 

Jf the matrix A of a hermitian form is real (and so symmetric), 
then x^Ax is called a real hermitian form. If, in addition, its 
variables are restricted to the real field, then the hermitian form 
becomes, in fact, the quadratic form x^Ax. A quadratic form may 
therefore be regarded as a special case of a hermitian form. 

A substitution x = Px' (12.6.1) 

for the variables of the hermitian form = x^Ax is called a 
linear transformation of <f>, and it is said to be singular or non¬ 
singular according as |P| = 0 or |P| 0. If P is unitary, the 

substitution is called unitary. Unless the contrary is stated, all 
transformations of hermitian forms are understood to be complex. 
The transformation (12.6.1) changes the hermitian form x^Ax to 

(Px")2A(Px') = x'^(F^AP)x'. 

Since P^AP is a hermitian matrix, the new function is again a 
hermitian form. We observe that the matrix A associated with 
the original hermitian form is changed to 

B = F^AP. 

A matrix transformation of this type is known as a conjunctive 
transformation if |P| ^ 0. When P is unitary this transformation 
becomes, of course, a unitary similarity transformation. Con¬ 
junctive transformations clearly form a group. 

The rank of a hermitian form is defined as the rank of the 
associated (hermitian) matrix. It is plainly invariant with respect 
to non-singular linear transformations of the hermitian form. 



XII, § 12.6 HERMITIAN FORMS 387 

A hermitian form of type is called diagonal. 

Here the coefficients are necessarily real since they are the 

diagonal elements of ahermitian matrix. The form 
is known as the unit n-ary hermitian form. 

Exercise 12.6.1. Express Theorem 8.2.5 (p. 230) in terms of hermitian 
forms. 

As in the theory of quadratic forms it is interesting to investigate 
reduction to diagonal form. Here the essential tool is Theorem 
10.3.7 which states that every hermitian matrix is unitarily similar 
to a diagonal matrix. This leads at once to the following result 
which is analogous to the orthogonal reduction theorem for 
quadratic forms (Theorem 12.2.1). 

Theorem 12.6.1. (Unitary reduction of hermitian forms) 

Let x^Ax be a hermitian form of rank r. Then there exists a unitary 
transformation which reduces x^Ax to the diagonal form 

where A^,..., are the non-vanishing characteristic roots of A.f 

If we do not insist that the reducing matrix should be unitary, 
then the coefficients in the resulting diagonal form will not neces¬ 
sarily be equal to the characteristic roots of A, but the non¬ 
vanishing coefficients will still be r in number, and they will all be 
real. The question as to their sign therefore arises just as in the 
case of quadratic forms, and we are led to a result analogous to 
Sylvester’s law of inertia (Theorem 12.4.2). 

Theorem 12.6.2. (Law of inertia for hermitian forms) 

If a hermitian form of rank r is reduced by two com'plex non-singular 
linear transformations to the diagonal forms 

p^XiXi+...+P^x^x^ 

respectively^ then the number of positive as is equal to the number of 
positive p's. 

The proof is virtually the same as that of the law of inertia for 
quadratic forms, and the details may be left to the reader. 

The signature of a hermitian form is defined in precisely the same 
way as the signature of a quadratic form. It is then seen that rank 
and signature characterize completely all equivalence classes of 
hermitian forms. 

t The number of non-vanishing characteristic roots is equal to r by virtue of 
Theorem 10.3.7 and the corollary to Theorem 10.2.3 (p. 296). 



388 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII, § 12.6 

Theorem 12.6.3. (Equivalence theorem for hermitian forms) 

Two hermitian forma are equivalent with respect to the group of 
complex non-singular linear transformations if and only if they have 
the same rank and the same signature. 

The connexion between rank, signature, and the signs of the 
characteristic roots of the associated matrix is the same for 
hermitian as for quadratic forms. We state the analogue of Theorem 
12.4.3 (p. 378). 

Theorem 12.6.4. (i) The rank of a hermitian form ^ = x^Ax is 
equal to the number of non-vanishing characteristic roots of A. (ii) The 
signature of <f> is equal to the difference between the number of positive 
and the number of negative characteristic roots of A. 

Exercise 12.6.2. Give detailed proofs of all assertions made above. 

Exercise 12.6.3. Show that the following statement is false. ‘Every 
hermitian form of rank r can be reduced to the diagonal form -\-XfXf 

by a suitable complex non-singular linear transformation.’ 

Exercise 12.6.4. Extend Lagrange’s method of reduction to hermitian 
forms. 

12 . 6 . 2 . The theory of hermitian forms can be used to narrow 
the bounds, previously obtained in § 7.5, for the real and imaginary 
parts of the characteristic roots of an arbitrary matrix. We first 
deduce a preliminary inequality for the values assumed by hermi¬ 
tian forms. 

Theorem 12.6.6. If A and A are the least and greatest characteristic 
roots of the hermitian matrix A, then, for all x, 

Ax^ < x^Ax < Ax^. 

Denote the characteristic roots of A by Ai,...,A^, and write 
D = dg(Ai,..., A^). By Theorem 10.3.7 there exists a unitary matrix 
U such that U^AU = D. Let x be any (complex) vector and put 
y = U^x. Since obviously 

Ay^y < y^y < Ay^y, 
the assertion follows at once. 

From Theorem 12.6.6 we derive important inequalities due to 
Bromwich (1906). 



HERMITIAN FORMS 


389 


XII, § 12.6 

Theorem 12.6.6. Let A he any complex matrix. Let fi, M be the 
least and greatest characteristic roots of the hermitian matrix J(A+S^) 
and V, N the least and greatest characteristic roots of the hermitian 

matrix ^ (A—• A^). If A is any characteristic root of A, then 

^ 5RIA ^ M, V ^ SmA ^ N. 

Let X be a unit vector such that Ax = Ax. Then 

A = Ax^ = x^’Ax, 

A = x^Ax = (x^Sx)^ = x^S^. 

Hence 5RIA = a(A+A) = x2’.|(A+A^).x, 

3mA = ~(A-A) = x^.l(A-S2’).x, 

and the required inequalities follow by Theorem 12.6.5. 


Bromwich’s bounds for 911A and 3mA are narrower than those 
of Hirsch and of Bendixon given in Theorems 7.5.2 and 7.5.3 
(pp. 210-11). For, applying the first inequality of Theorem 7.5.3 

to the matrices ^(A+A^) and ^(A—-A^) respectively, we obtain 

|/Lt| ^ nor, \M\ ^ na, |v| ^ nr, \N\ ^ nr. 

Hence < 91 IA < Jf < wa, 


—«T < V < 3mA < < nr, 


and Theorem 12.6.6 is therefore sharper than Theorem 7.6.3. 
Next, let A be real and denote the characteristic roots of 

^(A—A^) by The characteristic roots of 

B = i(A-A!f) = (6„) 

and then »Vi,...,iv„, and since the non-vanishing ones among them 
occur in conjugate pairs, we have 




vf-f.-.+vS 


-2 2 





390 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XH, i 12.« 


Now, by Theorem 7.1,3 (p. 198), 


tv. 




,.iv. = 2 


l<r<8<n 


brr bra 
^ar ^88 


i.e. 


2^ •'r^o= y 


l<r<s<n 


Kr<8<n 


h^)‘- 


Hence vf+.-.+v® = 2 2 I)#*, 

l<r<8<n' f 


where a = max i|ars"“®srl* 

Now the value of any non-vanishing v% occurs at least twice among 
*'!»•••> ^n> therefore 

|v*l < aV(«(n—1)/2) {k = 

Thus 

-o£V{^(n-l)/2) < V < 3mA < N < a^[n{n-\)l2). 


and these inequalities demonstrate the superiority of Theorem 
12.6.6 over Theorem 7.5.2. Nevertheless the bounds given by 
Theorems 7.5.2 and 7.5.3 are still useful, since /x, ilf, v, N are not 
known directly. 


PROBLEMS ON CHAPTER XII 

1. Determine the rank and signature of the ternary quadratic forms 

(i) 2y^—z^-\-xy-{'XZ\ (ii) 2xy--xz—yz, 

2. Determine the rank and signature of the following quaternary quadratic 
forms. 

(i) yz+zx-\-xy-{-xt-\-yt^zt\ 

(ii) x^ -i- 42/* 4- 42* — i*+ ^yz — 2zx 4- 2xy + 2xt 4- 2yt — 2zt ; 

(iii) 3(aj* 4” 2/*+4- 2aJ2 4- ^xt— 2yz — 2yt— 2zt, 

3. Suppose that = y\-\-...-\-y% where yn^^^^yr ar© linearly 

independent linear forms in Xi,„.,x^, Show that R(^) = r. 

4. Let ^ be a quadratic form in Xi,„.,x^ of rank r and signature s, and 
let 0 be a quadratic form in iCm 4 .i»...>a?nH-n rank r' and signature s'. Show 
that the (m4*w)-ary quadratic form ^4-0 has rank r-|-r' and signature 8+s\ 

6. Obtain an orthogonal reduction of the quadratic form 

re* + 61 /* -- 2* 4- 4 V2 zx, 

6. The equation, in rectangular non-homogeneous coordinates, of a 
central quadric Q is given by ^yz-\-izx^2xy+z^ = 1, Find the equations 
of the principal axes and the equation of Q referred to these axes as co¬ 
ordinate axes. 



XII 


PROBLEMS ON CHAPTER XII 


391 


7. Determine the principal axes of the quadric Q given by 

2a;*+22/®—V2y2+V22a;4-4a;i/ == 1, 

and obtain the equation of Q referred to its principal axes as coordinate axes. 

8. Let A be the matrix of the qua ^ratic form and let A be a 

characteristic root of A. Show that there exist values of not all 

zero, which satisfy the equation 

9. An n-ary quadratic form ^ vanishes only when all its variables are 
zero. Show that the rank of is n and that its signature is n or — n. 

10. Show that the maximum (minimum) value of the hermitian form 
X^Ax, subject to the condition x^x = 1, is equal to the greatest (least) 
characteristic root of A. Obtain the analogous result for quadratic forms. 

11. Find an orthogonal matrix P such that P“^AP is diagonal, where 



What are the maximum and minimum values of 
subject to the condition a;*+2/*+ 2® = 1 ? 

12. Let A = 2 ) • Show that the least value assumed by the quadratic 

form x^A^Ax, subject to the condition x^x = 1, is (9—V65)/2. 

13. Lot A be a characteristic root of the complex matrix A, and let M 
and m be the greatest and least characteristic roots of A^A. Show that 

mi < |A| < Mi, 

14. Show that, given a bilinear form <{> in the variables t/i,..., 2/„, 

substitutions of the type 

X = P§, y = Pri , P unitary, 
can bo found which turn ^ into a bilinear form with a triangular matrix. 

16. Two quadratic forms are said to be inverse if their matrices are 
inverses of each other. Show that if the inverse of a quadratic form <f> exists, 
then it has the same signature as 

16. Deduce from Theorem 6.4.1 (p. 183) that a quadratic form of rank 2 
is the product of two linearly independent linear forms. 

17. Show that a non-vanishing bilinear form ij/ in the two sets of variables 

a?!,..., and 2 / 1 ,..., 2/n is equal to the product of a linear form in Xj,,.., and 
a linear form in 2/i»*-->2/n if if rank of i/r is 1. 

18. Let A be symmetric, S skew-symmotric, and A-f S non-singular. 
Prove that the matrix T = (A+S)“^(A—-T) satisfies the relations 

T2’(A+S)T = A+S, T^(A-~S)T = A~S. 

Deduce that the quadratic formx^Ax is left invariant by the transformation 
of variables x = Ty. 

Prove that all real transformations of this type which leave invariant the 
form may be expressed as 

Xi = Xyi—kXy^f 

where A = — 


X2 = — A;A2/i+A2/a» 



392 BILINEAR, QUADRATIC, AND HERMITIAN FORMS XII 

19. Suppose that the principal axes of the quadric x^Ax = 1 come into 

coincidence with the coordinate axes when the quadric is rotated about the 
line specified by the vector Show that = 0 , where 

B = A—dg(Ax,A 2 y A 3 ), 

and Ai, Ag, A3 are the characteristic roots of A. 

20. Determine the rank and signature of the (2n)-ary quadratic form 

^ 2 n* 

21 . Show that the rank and signature of the quadratic form 

n 

2 (\rs+r+8)XrX, 

are independent of A. 

22. Determine the rank and signature of the ternary quadratic form 

ayz-{-bzx-\-cxy, 

23. (i) Find the rank and signature of the quadratic form 

o(a:?+—+ *n) + 26 (*1 ajj+Xj *3+...+a:„_i *„). 

(ii) Show that there exist real, linearly independent, linear forms 2 / 1 ,..., y^ 
in Xi,...,x^ such that, identically, 

XiXi+Xj^x^+...+x„_^x„ = y\-yl-...-yl. 

Express this result in terms of matrices. 

24. Determine the rank and signature of the n-ary quadratic form 

2 -*>)*• 

25. A is a real symmetric matrix with a negative determinant. Show 
that there exists a real vector x such that x^Ax < 0 . 

Find real values of x, y, z for which 

x^-\-2y^-\-Zz^-\-2yz—2zx-\-2xy < 0. 

26. Show that, if [A] == bd |Ax|, then [A]* = A and [A] > |ca|, whore A 

|xl-l 

is the greatest characteristic root of A^A and co is a characteristic root, 
greatest in modulus, of A. 

27. Write m(A) = max |A|, where s/ denotes the set of characteristic 
roots of A. Show that, for every normal matrix A, 

m(A) = b3 |Ax|. 

|xl“l 

Deduce that, if A, B, and A-j-B are normal, then 
m(A-f B) < m(A)+m(B); 
and that, if A, B, and AB are normal, then 

m(AB) < m(A)m(B). 

28. Let A = (ara) be a hermitian matrix of order n, and let A and A' be its 

greatest and least characteristic roots. Show that A' < < A (r= l,...,n). 

29. Let A be a hermitian nxn matrix, and let B be the submatrix obtained 
when the last row and last column of A are deleted. Show that, if /a denotes 
any characteristic root of B and A, A' denote the greatest and least character¬ 
istic roots of A, then A' < jLt < A. 



XII 


PROBLEMS ON CHAPTER XII 


393 


30. Let A = (a„) be a hermitian matrix of order n; and let, for 1 < r < n, 
Xf and denote the greatest and least characteristic roots of the submatrix 

• • • ®lr\ 


Show that 

Xji ^ ^ ... ^ A2 ^ Ai — Aj ^ A2 ^ ... ^ A^—1 ^ ^n* 

31. Let (a^g) be a real symmetric nxn matrix, and write 

Oil . . . ®ljfc 

=. {k= l,...,n). 

®Jfcl • • • ^kk 

Show that, if no vanishes, then there exist linearly independent linear 
forms yi 9 :*,yn such that 


2 

•^“1 


- 2 / 1 +...+- 


f.5»l "n ••'‘n 

Deduce that the signature of is equal to n—lt^ where i is the number of 
changes of sign in the sequence 1, mj, 

32. Let Z/i,..., Aj,..., A;^ be linear forms in a;i,...,a;„, and let 

/(a?i,...,aj„) = Z/i Ai-|'*»»4’.^fc A;;. 

be a non-singular n-ary quadratic form. Show that among Li,..., 
Ai,..., Afc there must be a set of n linearly independent linear forms. 

33. The real non-singular quadratic forma^n) in n (> 2k) variables 

vanishes if — ... = = 0. Show that, by a real non-singular 

linear transformation, / can bo reduced to the form 

2 / 1 2/jfc+l + 2/2 2/fc+2+ — +2/A: 2/2fc + 0(2/2fc+l»‘-» 2/n)» 
where 0 is non-singular; and deduce that the signature 5 of / satisfies the 
inequality |5| < n—2k, 

34. Show that, if x^Ax is a quadratic form of rank r and signature a, then 
there exists a subspace U of such that d(U) = i(^—«) and x^Ax < 0 for 
every non-zero vector x g U. 

35. Let denote the sum of fcth powers of the roots of the equation 
4--f-... -f == 0 having complex coefficients. Show that the number 

of distinct roots is equal to the rank of the matrix 


/^2n-2 

^2n-3 

- • ^n-l\ 

/ ^2n-3 

^2n—4 

. . Sn_2 


Sn-2 

. Sq f 


36. Show that a real n-ary quadratic form is equal to the product of 
two real, linearly independent, linear forms if and only if its rank is 2 and 
its signature 0. 





XIII 


DEFINITE AND INDEFINITE FORMS 

In the previous chapter quadratic and hermitian forms were 
classified according to rank and signature. We shall now introduce 
another classification, which is based on the values that the forms 
in questions are capable of assuming.*!* Though this new classifica¬ 
tion is cruder than the former (in the sense that each class will now 
consist of one or more of the former classes) it is, for some purposes, 
of even greater importance. 

13.1. The value classes 

13.1.1. Definition 13.1.1. Let <!> be a hermitian or a quadratic 
form in the variables x^. 

(i) ^ is POSITIVE DEFINITE (NEGATIVE DEFINITE) if (f>> 0{<f><0) 

except when = ... = = 0. A form which is positive definite 

or negative definite is called definite. 

(ii) ^ is POSITIVE SEMI-DEFINITE (NEGATIVE SEMI-DEFINITE) if 
^ > 0 (^ < 0)for all values ofx^,..., x^^ and <f) = 0 for some values of 
Xi,...y x^y not all zero. A form which is positive semi-definite or 
negative semi-definite is called semi-definite. 

(iii) <l> is INDEFINITE if it is capable of assuming both positive and 
negative values. 

Definition 13.1.2. The five classes of hermitian {or quadratic) 
forms specified in the preceding definition will be called the value 

CLASSES.:]: 

It is evident that the subdivision of hermitian or quadratic forms 
into the five value classes is exhaustive and also (if the trivial case 
of the identically vanishing form is ignored) exclusive. 

The following binary quadratic forms illustrate the five possible 
cases. 

t In speaking of the values of a quadratic (hermitian) form we mean, of course, 
the values it assumes when the variables range over all real (complex) numbers* 
We may etlso remind the reader that a hermitian form assumes real values only. 

% The separation of quckdratio forms into value classes has 6m important 
application in the study of maxima and minima of functions of several variables. 
See Stoll, 7, 127, Ex. 6.7. 



XIII, §13.1 THE VALUE CLASSES 396 

Positive definite: Positive semi-definite: (a;+y)* 

Negative definite: Negative semi-definite: —{x-\-yY 

Indefinite: x^-^y^ 

If ^ is positive definite (positive semi-definite), then — is 
negative definite (negative semi-definite). If (f> is indefinite, then 
so is — 

A real symmetric matrix, say A, is associated with two forms— 
the hermitian form x^Ax and the quadratic form x^Ax. Each of 
these forms belongs, of course, to a certain value class, but there is 
fortunately no possibility of confusion since the two forms belong, 
in fact, to the same value class. 

Theorem 13.1.1. If A is a real symmetric matrix, then the her- 
mitian form (f) = x^Ax and the quadratic form ifj = x^Ax belong to 
the same value class. 

If <f) is positive definite, then (since the set of values assumed by 
iff is contained in the set of values assumed by <!>) iff is also positive 
definite. If, on the other hand, iff is positive definite, we write 
X = %+iit\ (where tq are real) and obtain 

x^Ax = ^^A^-fTj^’ATj. (13.1.1) 

Hence x^Ax > 0 except for ^ = yj = 0, i.e. except for x = 0. 
Thus <f> is positive definite. In the same way it is shown that ^ and 
i/f are negative definite together. 

Next, let iff be positive semi-definite. Then, by (13.1.1), 
x^Ax ^ 0 for all X. Now, in view of the previous result, cannot 
be positive definite; hence it is positive semi-definite. Again, if <f> 
is positive semi-definite, then a fortiori x^Ax ^ 0 for all real x, 
and so ^ is positive semi-definite. Negative semi-definiteness is, 
of course, dealt with in exactly the same way. 

Again, if iff is indefinite, then <f> is obviously indefinite, too. On 
the other hand, if ^ is indefinite, then so is iff in view of the previous 
results. 

It is sometimes convenient to transform the terms ‘positive 
definite’ and so on from forms to matrices. 

Definition 13.1.3. A hermitian matrix A is said to belong to the 
same value class as the associated hermitian form. 

Thus we may speak of positive definite matrices, indefinite 
matrices, and so on. When A is real and symmetric, the terms 



306 DEFINITE AND INDEFINITE FORMS Xm.§13.1 

‘hermitian matrix’ and ‘hermitian form’ in the above definition 
may, in view of Theorem 13.1.1, be replaced by ‘real symmetric 
matrix’ and ‘quadratic form’. 

Exercise 13.1.1. Let (aj.^) be a positive definite hermitian matrix and let 
be real non-zero numbers. Show that the (hermitian) matrix 
(o,g6f6,) is again positive definite. 

13.1.2. In order to devise tests for distinguishing between her¬ 
mitian (or quadratic) forms belonging to different value classes 
it is useful to note in the first place that non-singular linear trans¬ 
formations do not affect the set of values assumed by a form. 
This is expressed more precisely in the next theorem. 

Theorem 13.1.2. Let 0 be two hermitian (quadratic) forms such 
that one can be transformed into the other by a complex (real) non¬ 
singular linear transformation. If S(^), S(0) denote the sets of 
values assumed by tfs respectively as their variables take all complex 
(real) values, not all zero, then S(^) = S(0). 

We shall state the proof in the language of hermitian forms. Let 
^ = x^Ax, ifs = x^Bx. 

By hypothesis there exists a non-singular matrix P such that 
P^AP — B. Suppose that a 6 S(^), i.e. for some complex vector 
5^0, I^B? = (X. Hence (P%)^A(P%) = a, and since P§ # 0 it 
follows that a 6 <5(<f>). Thus c S(0). By symmetry 

S(^) c S(0), and the theorem is therefore proved. 

Theorem 13.1.3. The value class of a hermitian (quadratic) form 
is invariant under complex (real) non-singular linear transformations. 

This result follows at once from Theorem 13.1.2 since a hermitian, 
or quadratic, form is positive definite (negative definite) precisely 
when Q(<f>) contains only positive (negative) numbers, positive 
semi-definite (negative semi-definite) precisely when S(<^) contains 
only positive (negative) numbers and 0, and indefinite precisely 
when (5(<f>) contains both positive and negative numbers. 

Alternatively, we may express Theorem 13.1.3 by saying that 
two hermitian (quadratic) forms which are equivalent with respect 
to the group of complex (real) non-singular linear transformations 
belong to the same value class. The converse need not be true, and 
this fact bears out our earlier remark about the relative crudity of 



THE VALUE CLASSES 


397 


XIII, § 13.1 

the distribution of hermitian forms into value classes, as compared 
with their classification by rank and signature. 

We shall next assign each hermitian or quadratic form to its 
value class by means of the chai.acteristic roots of the associated 
matrix. 

Theorem 13.1.4. Let (f> be the hermitian form associated with the 
(hermitian) matrix A. 

(i) ^ is positive definite (negative definite) if and only if all 
characteristic roots of A are positive (negative). 

(ii) <f} is positive semi-definite (negative semi-definite) if and only 
if all characteristic roots of A are non-negative (non-positive) and at 
least one root is equal to zero. 

(iii) ^ is indefinite if and only if A has at least one positive and at 
least one negative root. 

When A is real and symmetric this result enables us (in view of 
Theorem 13.1.1) to distinguish between value classes of quadratic 
forms. 

By Theorems 12.6.1 (p. 387) and Theorem 13.1.3 we see that it 
suffices to consider the hermitian form 

whereAi,...,A^arethecharacteristicrootsofA. IfAi,...,A^ > 0,then 
<l> > 0 except when = ... = == 0, and so <f> is positive definite. 

Next, let <!> be positive definite and assume that Ai,...,A^ are not 
all positive, say Aj < 0. Then, for = 1, ajg = •••== we 

have ^ = Ajl < 0; and this contradicts our hypothesis. Hence 
Ai,...,A^ >0, and (i) is established, since the case of negative 
definite forms is treated similarly. 

Again, suppose that all A^ are non-negative and that at least one 
of them vanishes, say A^ = 0; A 2 ,...,A^ ^ 0. Then ^ 0 for all 
<f> = 0 for Xi = 1, X 2 = ... = Xn = 0. Hence is 
positive semi-definite. If, on the other hand, ^ is known to be 
positive semi-definite, then A^,..., A^ ^ 0, for otherwise (f> would be 
capable of assuming negative values. If now we had K-,K > 0, 
then <l> would be positive definite; hence at least one A^ must vanish. 
This proves (ii), since the case of negative semi-definite forms is 
treated similarly. 

In view of (i) and (ii), the third part of the assertion now follows 
automatically. 



398 


DEFINITE AND INDEFINITE FORMS 


Xin, § 13.1 

Exeboiss 13.1.2. Show that a definite form is non-singular, a semi- 
definite form singular, and that an indefinite form can be either. 

Theorem 13.1.4 can easily be restated in terms of rank and 
signature. 

Theorem 13.1.5. Let ^ be an n^ary hermitian form of rank r and 
signature s. 

(i) <{} is positive definite {negative definite) if and only if r = s = n 
{r = —5 == n). 

(ii) <l> is positive semi-definite {negative semi-definite) if and only 
ifr==s<n{r== -—s < n). 

(iii) <f> is indefinite if and only if \s\ < r. 

This result follows as an immediate consequence of Theorems 
12.6.4 (p. 388) and 13.1.4. It is of greater practical importance than 
Theorem 13.1.4, for while the location of characteristic roots may 
be troublesome, rank and signature can be computed fairly easily 
by means of Lagrange’s method of reduction. 

Exebcise 13.1.3. Show that two quadratic forms which belong to the 
same value class need not be equivalent (with respect to T^) unless both are 
positive definite or both negative definite. 

13.2. Transformations of positive definite forms 

We know by Theorem 12.6.1 that every hermitian matrix can be 
reduced to diagonal form. The proof of this fact depends on the 
comparatively difficult Theorem 10.3.7 (p. 304), but for the special 
case of positive definite forms a particularly simple proof indepen¬ 
dent of earlier results can be given. The proof below not only 
establishes the existence of a canonical form for every positive 
definite hermitian form but also enables us to obtain more precise 
information about the reducing matrix than could be extracted 
from earlier, and more general, arguments. 

Theorem 13.2.1. // ^ = x^) is a positive definite 

hermitian {qtmdratic) form, then there exists a complex {real) linear 
transformation of determinant 1, specified by the equations 

Vl “ ^In^n 

^2 ~ ^2n^n 







(13.2.1) 




xra, § 13.2 TRANSFORMATIONS OF POSITIVE DEFINITE FORMS 399 


which changes <f> into the diagonal form 


(Ci2/i+-+c„y^), (13.2.2) 

where > 0. 


We state the proof for hermitian forms. For quadratic forms the 
reasoning is exactly the same except that each symbol It must be 
replaced by The argument below depends effectively on 
Lagrange’s method of reduction. 

Write <f> = <l,{Xi,...,xJ = 2 ars^r^s- 

r,s==l 

Then = ^(1,0,...,0) > 0, 

and so 

n n 

= aiiXiXi+'2{ai^XiX^-{-a,iX,Xj)+ 2 <^rs^r^B 
r=2 r,s=2 


l\ ®11 /\ ®11.. / 

2 i 


= a 


11 


+ -(13.2.3) 

®11 I r.7^2\ f 


^11 




say, where == > 0 and 0 (a; 2 ,...,^n) ^ hermitian form since 

Moreover, «A(ir 2 ,..., o;^) is positive definite. For otherwise there exist 
numbers zero, such that fl^^it 

case let defined by the equation 

^l + «i2^2+--+“lnfn = 

Then < 0, and this contradicts the 

hypothesis that ^ is positive definite. 

Since, then, ^(arg,..., is positive definite we may apply to it the 
same process that has just been applied to <^. Carrying out this 
process repeatedly, we ultimately obtain 

= Ci|a;i4-ai2a:sj4-...+ai„a:„|2+ 

+Cala:2+0(ij3a;34-... + a2„X„12+-+Cn-lK-l+«»-l.n«nP+Cnl*nl®. 
where Cl,..., > 0. The transformation specified by (13.2.1) is now 

seen to change <f> into the diagonal form (13.2.2). 



400 


DEFINITE AND INDEFINITE FORMS XIII, § 13.2 

As immediate consequences of Theorem 13.2.1 we have the 
following results which are also implicit in our earlier discussion. 

Corollary 1 . (i) Every positive definite n-ary hermitian form 
can be changed into x^x^-\-.,»-\-x^x^bya complex non-singular linear 
traiisfarniation. 

(ii) Any two positive definite n-ary hermitian forms are equivalent 
with respect to the group of complex non-singular linear transforma¬ 
tions. 

Corollary 2 . (i) Every positive definite n-ary quadratic form 
can be changed into x\-\-.„-\-x\ by a real non-singular linear trans¬ 
formation, 

(ii) Any two positive definite n-ary quadratic forms are equivalent 
with respect to the group of real non-singular linear transformations, 

A further useful result which can be derived from the above 
discussion is as follows. 

Theorem 13.2.2. The determinant of any positive definite hermi¬ 
tian matrix is positive. 

Let A be the matrix associated with a positive definite hermitian 
form. Then, by Corollary 1 (i), we know that there exists a matrix 
P such that F^AP = I. Hence |P||P||A| = 1, and so |A| > 0. 


13.3. Determinantal criteria 


13.3.1. In § 13.1.2 the value classes were discussed in relation 
to the characteristic roots and to rank and signature. We shall 
now approach the question from a different point of view and 
derive methods for deciding the value class of a hermitian or quad¬ 
ratic form in terms of the principal minors of its matrix. Our first 
and most fundamental result, which was discovered by Frobenius 
in 1894, relates to positive definite forms. 


Theorem 13.3.1. A necessary and sufficient condition for the 
hermitian form ^ 

r,»-l 

to be positive definite is that 


^11 ^ 


an ai2 
0^21 ®22 


> 0 , ..., 


11 


*nl 


^In 




> 0. 

(13.3.1) 



XIII. § 13.3 


DETERMINANTAL CRITERIA 


401 


To prove the necessity of (13.3.1), suppose that is 

positive definite and consider, for 1 < m < n, the hermitian form 

m 

r,8=l 

in the variables This form is positive definite, for other¬ 

wise we should have <l)(xj^,...,x^,0,,..,0) < 0 for some values of 

this would be contrary to the hypothesis 
that (f>(x^,.,.,x^) is positive definite. Hence, by Theorem 13.2.2, 


Uii 

• • dim 

®ml 

• * ^myn 


This holds for m = and (13.3.1) is therefore satisfied. 

Next, we use induction with respect to n to prove the sufficiency 
of (13.3.1). Forn = 1 the assertion is trivial. Assume that it holds 
for w— 1, where n 2. We have to show that if (13.3.1) is satisfied, 
then (f) is positive definite. 

Since > 0 we can, by virtue of (13.2.3), write 

4>{Xi,...,x^) = ^laua;i+...+ai„x„|2+i/f(x2,...,a;„), (13.3.2) 

«u 

n 

where ilt{x^,...,x^) = 2 

r.s—2 

is a hermitian form and 


(r,.= 2,...,7l). 

Uii 


Now, for m = 2,.. 

we have 




^22 

• • ^2m 


Ull . 



. 

== 

. . 

. . . 

bm2 

bfum 


^ml 

* • 


This identity follows immediately on subtracting (for s = 2,...,m) 
times the first column from the 5th column in the determinant 
on the right-hand side. Hence, by (13.3.1), 

^22 • • • ^2m 

.>0 (m = 2,...,7 i), 

bfn2 • • • ^wm 

and so, by the induction hypothesis, «/r(a: 2 ,.-, ^n) is positive definite. 
Suppose now that, for some values we have 

(l>(X^y...yXj < 0 . 

Dd 


6682 



402 


DEFINITE AND INDEFINITE FORMS XIII, § 13.3 

Then, using (13.3.2), the fact that is positive definite, 

and the inequality > 0, we infer the relations 

= 0, (13.3.3) 

(13.3.4) 

for these values of Now (13.3.3) implies = ••• == = 0. 

Hence, by (13.3.4), x^ = 0, It follows that ^{xi,..,,x^) is positive 
definite. The proof of the theorem is therefore complete. 


In the formulation of the theorem just proved we considered the 
variables in the order of their suffixes, and our selection of minors 
in A was dependent on that order. But the value class of a hermitian 
or quadratic form is obviously unaffected by the labelling of the 
variables, and the theorem will therefore continue to hold if the 
variables are relabelled in any manner. Thus, for example, we 
know by Theorem 13.3.1 that the ternary hermitian form 




is positive definite if and only if 




ni 

^21 


Qi 


^12 

22 


> 0 , 


«!! 

®12 

^13 

«21 

®22 


®31 

®32 

®33 


> 0 . 


Let us now write j/g* J/ 3 > 2/i for x^, x^y x^ respectively. Then ^ assumes 
ifs = a33yiyi+a3iyi2/2+®32yiy3 
+ai3 ^2 2/i+aii ^2 y2+ai2 ^2 Vz 
■+'<^23 ^3 2 / i +®21 ^3 2 / 24“®22 ^3 2 / 3 * 


The matrix associated with ^ is therefore given by 



Hence and so is positive definite if and only if 


® S 3 ^ 


I <*88 «81 

> 0, 

1 ®18 ®11 



®33 ®31 ®32 

®13 ^11 ®12 

®23 ®21 ^22 


> 0 , 



xra, § 13.3 DETERMINANTAL CRITERIA 

i.e. if and only if 


403 


®33 ^ 


«11 

«13 

®31 

®33 



«12 

ai8 

^21 

®22 

®23 

®31 

U32 

%3 


In the same way we obtain other sets of conditions necessary and 
sufficient to ensure that ^ should be positive definite, and we can 
readily satisfy ourselves that ^ is positive definite if and only if aU 
principal minors of the matrix of <f> are positive. In the general case 
the situation is analogous, as is shown by the next theorem which, 
for convenience, we state in terms of matrices. 


Theorem 13.3.2. A hermitian matrix is positive definite if and 
only if all its principal minors are positive. 


The proof of this result is implicit in the preceding illustration. 
Let the matrix in question be A = (a,.^). If all its principal minors 
are positive, then (13.3.1) is satisfied a fortiori and A is then 
positive definite by Theorem 13.3.1. 

Assume next that A is positive definite and consider the principal 


mmor 


ki 


^kfnki 


^kikm 

^kfn kfn 


where 1 < m < n, 1 < < ... < < ti. Let the numbers 

^m+iv> defined by the conditions 


1 ^ ^m+l ••• ^ (!>•••>^)* 


The hermitian form 

Changing the order of its terms we can rewrite this form as 

n 

^ = 2 ^krkt^kr^k,- 

r,«=l 

Now let the variables x,^ be replaced by yi,..., y„ respectively. 

The resulting hermitian form 

0=2 <^krk.yry$ 

is again positive definite and therefore, by Theorem 13.3.1, A > 0. 
The proof is thus complete. 




404 


DEFINITE AND INDEFINITE FORMS XIII, § 13.3 


Exebcise 13.3.1. Prove Theorem 13.3.2 by equating to zero selected a;*s 

n 

in the form 2 
r,a-l 

Exebcise 13^3.2. Show that the hermitian n x n matrix (a^^) is positive 
definite if and only if 


ann > 


0 , 


®n,n-l 


®nn 


> 0, 


On 

• 

• ®ln 

®nl 

. 

• ®nn 


13.3.2. Prom Frobenius’s basic result on positive definite forms 
it is easy to deduce determinantal criteria for the remaining value 
classes. We note in the first place that since a hermitian form <f> is 
negative definite if and only if —is positive definite, Theorem 
13.3.1 leads at once to conditions for negative definiteness. 


Theorem 13.3.3. A necessary and sufficient condition for the 

n 

hermitian form 2 ^ negative definite is that 

r,«=l 


^11 


«ii 

Ol2 

«21 

®22 


> 0 , 


ttll 

^12 

^13 

^21 

^22 

«23 

®31 

®32 

®33 


As in the case of positive definite forms, this result can be re¬ 
formulated in such a way as to become independent of any par¬ 
ticular labelling of the variables. Indeed, Theorem 13.3.2 implies 
immediately our next theorem. 


Thbobem 13.3.4. A hermitian matrix is negative definite if and 
only if all its principal minors of even order are positive and all its 
principal minors of odd order are negative. 

One of the advantages of the determinantal criteria is the ease 
with which they can be applied. To illustrate their use consider 
the ternary quadratic form 

ft = 10*®—2y*-|-3«*-l-4«a:-f-4a^, 

whose matrix is /lO 2 2\ 

2 -2 0 ). 

\ 2 0 3/ 

We have 


1 10 2 1 

10 

2 

2 

1 2 —2 1 

2 

—2 

0 


2 

0 

3 


10 > 0 ; 


= -64 <0. 



DETERMINANTAL CRITERIA 


405 


XIII, § 13.3 

Hence <f> is neither positive definite nor negative definite. Moreover, 
it is non-singular and so cannot be semi-definite. Hence ^ is 
indefinite. 

We next turn to semi-definite forms and matrices. 


Theorem 13.3.5. A hermitian matrix is positive semi-definite if 
and only if it is singular and all its principal minors are non-negative. 

Denote the matrix in question by A = {a^^) and suppose that it 
is positive semi-definite. Then, by Theorem 13.1.4 (ii) (p. 397 ), at 
least one of its characteristic roots vanishes and the matrix is 
therefore singular. Moreover, by hypothesis, 

= x^Ax > 0 

for all X. Let 1 < m < 1 < < ... < and denote by 

m-Bxy hermitian form obtained from <l>(Xi,,.,fXn) 
on equating to zero every x^ whose suffix is not equal to one of the 
Then clearly \fi{x^^,,.,,Xj^J) ^ 0 for all and so the 

matrix , ^ v 


''kiki • 

• 


'kmki • 

. 



is either positive definite or positive semi-definite. Its characteristic 
roots are therefore all non-negative and consequently its deter¬ 
minant is also non-negative. Thus all principal minors of A are 
non-negative. 

Assume next that A is singular and that all its principal minors 
are non-negative. Writing 




(m== l,...,n) (13.3.5) 


we have 




where, by Theorem 7 . 1.2 (p. 197), is equal to the sum of all i-rowed 
principal minors of A„i. Hence, by hypothesis, 

|rt^+A „|>0 (m=l,...,n) 

when < > 0 . It follows by Theorem 13.3.1 that <I+A is positive 
definite for every i > 0 . Now assume that, for some x 9 ^= 0 and 
some T > 0, 


x^Ax = — T. 



406 DEFINITE AND INDEFINITE FORMS XIII, § 13.3 

Writing t = we then have x^(fl+A)x = 0 , and this 

is not possible in view of our earlier conclusion. Hence x^Ax > 0 
for all X, and since A is singular it must be positive semi-definite. 


As an immediate, consequence of the theorem just proved we 
have the following result for negative semi-definite matrices. 

Theorem 13.3.6. A hermitian matrix is negative semi-definite if 
and only if it is singular and all its principal minors of even order are 
non-negative^ while all those of odd order are non-positive. 

In view of Theorem 13.3.1 it is plausible to conjecture that the 
hermitian matrix A = (a^^) is positive semi-definite if and only if 
|A| = 0 and |A^| ^ 0 (m = l,...,n— 1 ), where A^ is defined by 
(13.3.5). This conjecture is, however, false as is demonstrated, for 

. We have, on the other hand, 

the following correct (though incomplete) analogy with Theorem 
13.3.1. 


example, by the matrix ^ 


Theorem 13.3.7. Let A = (a^^) be a hermitian matrix. If 

lAil >- 0, [Agl > 0, ..., |A^_il >• 0, lA^I = 0, 

where is defined by (13.3.5), then A is positive semi-definite. 

Since A is singular it cannot be positive definite and it suffices, 
therefore, to show that <f>{x^,...^x^ = x^Ax is non-negative for 
all X. Assume, on the contrary, that for some x ^ 0 and some 
T > 0 we have x^Ax = —r. Then x^ ^ 0 , for 

where y = {x^ .is a positive definite form in 

by virtue of Theorem 13.3.1. Writing t — rlx^x^, we therefore 
have S’^Ax+tx^Xn = 0 , i.e. 

S*'Bx = 0 , (13.3.6) 

where x 7 ^ 0 and 



Now < > 0 , and therefore 

|B| = |A|+«!A„_i| > 0. 




XIII, §13.3 DETERMINANTAL CRITERIA 407 

Hence, again by Theorem 13.3.1, B is a positive definite matrix 
and therefore (13.3.6) implies x = 0. We thus arrive at a contra¬ 
diction, and it follows that x^Ax > 0 for all x. 

Exercise 13.3.3. Suppose that A is a singular hermitian matrix and that 

• • • ^mn 

. >0 (m = 2,..., n). 

dfim • • • ®nn 

Show that A is positive semi-definite. 

A criterion for indefinite matrices can be obtained without 
difficulty from Theorems 13.3.2, 13.3.4, 13.3.5, and 13.3.6. 

Theorem 13.3.8. A hermitian matrix A ia indefinite if and only if 
it satisfies at least one of the following two conditions. 

(i) A possesses a negative principal minor of even order. 

(ii) A possesses a positive principal minor of odd order and also a 
negative principal minor of odd order. 

It is, in the first place, obvious that if A satisfies either of the 
two stated conditions, then it is indefinite. It remains, therefore, 
to show that if A is indefinite and does not satisfy (i), it must satisfy 
(ii). Since A is indefinite it possesses at least one negative principal 
minor, and by hypothesis all its negative principal minors are of 
odd order. Now if all principal minors of odd order were non¬ 
positive, then A would be negative definite or negative semi-definite 
Hence at least one principal minor of odd order is positive and (ii) 
is therefore satisfied. 

It should be noted that conditions (i) and (ii) are not incompatible 
but that neither of them implies the other. Thus, of the three 
indefinite real symmetric matrices 

/I 0 1\ /2 4 2\ (2 1 1\ 

b :)■ b ^ v- (; j -:)■ 

the first satisfies both (i) and (ii), the second (i) only, and the third 
(ii) only. 

TCynmriTgin 13.3.4. Use Theorem 13.3.8 to show that the ternary quadratic 
form mentioned on p. 404 is indefinite. 




408 


DEFINITE AND INDEFINITE FORMS XIII, § 13.4 


13.4. Simultaneous reduction of two quadratic forms 

13.4.1. In § 12.2 and § 12.3 we considered the reduction of a 
given quadratic form to diagonal form. There are occasions, 
however, when we wish to go a step further and to effect a simul¬ 
taneous reduction of two quadratic forms. This notion is explained 
more precisely in the next definition. 

Definition 13.4.1. The quadratic forms x^Ax, x^Bx are said to 
be SIMULTANEOUSLY REDUCIBLE (to diagonal forms) if there exists 
a non-singular linear substitution x = Uy which transforms both the 
given quadratic forms into diagonal forms. 

The need for simultaneous reduction arises, for example, in 
geometrical problems when we attempt to choose a system of 
coordinates in such a way that the equations of two given conics 
both assume diagonal form. Now it is not possible to carry out in 
every case a simultaneous reduction of two quadratic forms, and 
the aim of the present section is to obtain certain conditions which 
ensure the existence of such a reduction. Of the two cases discussed 
below the first might, in fact, have been disposed of in Chapter 
XII, but it is more convenient to treat both cases together.! 

If A and B are given matrices we shall need to consider the 
equation |A—AB| = 0 in the unknown A. If B is non-singular, 
this equation is of degree n and so has n roots. When B = I the 
equation becomes simply the characteristic equation of A. 

13.4.2. It does not matter greatly whether the discussion in this 
paragraph is concerned with real or with complex matrices. We 
choose the latter alternative since it enables us to formulate the 
main result (Theorem 13.4,2) in a slightly neater form. 

Theorem 13.4.1. If A, B are complex symmetric matrices, [jl and 
V distinct roots of the equation |A—AB| = 0, and x, y vectors such 

that (A-;iB)x = 0,. (A-vB)y = 0, 

then x^Ay = 0, x^By = 0. 

We have 

= jiiy^’Bx = y^’Ax = x=fAy = rx^By. 

Since fi^v, this implies x^By = 0, and hence x^’Ay = vx^’By = 0. 

t For a treatment of simultaneous reductions of two quadratic forms which 
makes no use of matrix algebra see Bdcher, 11, 167-73. 



XIII, §13.4 REDUCTION OF TWO QUADRATIC FORMS 409 

Theorem 13.4.2. If A, B are complex symmetric matrices^ B is 
non-singular^ and the roots of the equation [A-—ABl = 0 are 

distinct^ then the quadratic forms x^Ax, x^Bx can be reduced simul¬ 
taneously to the diagonal forms 

2/i+-+y| 

respectively. 

Let Xi,...,x^ be any non-zero (complex) vectors such that, for 
r = l,...,n, (A—A,.B)Xy = 0, i.e. 

Ax,. == A,.Bx,. (r = (13.4.1) 

Assume that Xi,...,x,^ are linearly dependent. Then there exists a 
number k in the range 2 < such that x^,..., are linearly 

independent while Xi,...,x^_i,X;i. are linearly dependent. There 
exist, therefore, numbers not all zero, such that 

0 = ajXi-f ...+aA,X;fc. (13.4.2) 

Now «!,..., cannot all be zero, and we may assume without 
loss of generality that ol^ 0. Using (13.4.1) we obtain 

0 = AXi-l-.,.+aA; Ax^ = B(aiAiXi-f-...+a^A;5.X;^). 

But |B| #0, and so 

0 = aiAiXi+...-fcx;i.Aj^.Xjj.. (13.4.3) 

By (13.4.2) and 13.4.3) we have 

and since ^ 0 and all A^ are distinct, this implies that, contrary 
to our assumption, Xi,...,X;fc_i are linearly dependent. Thus we 
arrive at a contradiction and it follows that Xi,...,x,^ are, in fact, 
linearly independent. Hence the matrix U defined by the relations 
JJ^j, = X,. (r = n) is non-singular. Writing 

x^Bx,. = fjij. (r = l,...,n), 
we have x^Ax,. = A,./z,. (r = l,...,n). 

Hence, by Theorem 13.4,1, 

x^AXg = x^Bx, = (ryS = l,...,n), 

and therefore 

(U2’AU)„ - (U2’),*AU*, - (U*,rAU*, = x^’Ax, = S,A/*r- 
Thus U^AU = dg(Ai/xi,...,A„/i„), 

and similarly U^BU = 



410 DEFINITE AND INDEFINITE FORMS XIII, § 13.4 

Since B and U are both non-singular, it follows from the last 
equation that ^ 0. Hence can be chosen in 

such a way that = ... = = 1, and we then have 

U^AU = dg(Ai,...,A«), U^BU = I. 

This proves our assertion. 

Exercise 13.4.1. Interpret Theorems 13.4.1 and 13.4.2 for the case 

B « I. 

A result analogous to Theorem 13.4.2 can, of course, be 
established for real quadratic forms and real transformations by 
precisely the same argument as used above. Indeed, if x^Ax, 
x^Bx are real quadratic forms, B is non-singular, and all roots of 
|A—AB| = 0 are real and distinct, then there exists a real non¬ 
singular linear transformation x = Uy which carries the two given 
quadratic forms into 0L^y\+ and ^iy\+-*-+^nyn respec¬ 
tively. 

13.4.3. We shall now assume that all matrices and quadratic 
forms under consideration are real and that one of the two given 
quadratic forms, say x^Bx, is positive definite. Two vectors 
X, y will be said to be ^-orthogonal if 

x^By = 0. 

Exercise 13.4.2. What is the meaning of the term ‘I-orthogonal’ ? 

Theorem 13.4.3. If A, B are real symmetric matrices and B is 
positive definite, then there exists a real non-singular linear transforma¬ 
tion X = Uy which changes the quadratic forms x^Ax, x^Bx into 

respectively, where A^,..., A^ are the roots of the equation |A—AB | = 0. 

This theorem is due to Weierstrass (1858). The special case when 
Ai,...,A,^ are distinct had been discussed earlier by Cauchy and by 
Jacobi. 

We apply, in succession, two real non-singular linear transforma¬ 
tions tox^Ax andx^Bx. The first is chosen so as to carry x^Bx into 
the unit form.f It carries x^Ax into some new quadratic form 
which we call The second transformation is orthogonal and 
chosen so as to carry into a diagonal form;t it will, of course, 
leave the unit form unchanged. The product of the two transforma- 

t This is possible by virtue of Corollary 2 (i) to Theorem 13.2.1 (p. 400}* 

t This is possible by virtue of the theorem on the orthogonal reduction of 
quadratic forms (p. 363). 



XIII, §13.4 REDUCTION OF TWO QUADRATIC FORMS 411 

tions is a real non-singular transformation x = Uy which carries 
x^Bx into yi+...+2/n x^Ax into some diagonal quadratic 

form, say xhen 

U^AU = dg(a>i,..., co^), U^BU = I, 
and so U^(A~AB)U = dg(ci>i—A,..., A). 

Hence 1U^(A—AB)U| = A)... (a>^—A), 

and |A—AB| = c(A—oji)... (A—coj (c 0). 

Hence a>i,...,a>^ are the roots of jA-—AB| = 0, and the theorem is 
proved. 

Theorem 13.4.3 is mainly valuable as an existence theorem, for 
the actual construction of the reducing matrix U as described in 
the proof above is extremely tedious in all but the simplest 
numerical cases. The reason for this is that to obtain the required 
transformation we have first to determine the two auxiliary 
transformations and then to compute their product. It is therefore 
desirable to devise a method which effects the simultaneous reduc¬ 
tion in a single step. Such a method will be found in Theorem 13.4.7, 
but before we can prove this result we shall have to consider 
certain preliminary questions. 

In each of the next three theorems it is assumed that A, B are 
real and symmetric and that B is positive definite. 

Theorem 13.4.4. All roots of the equation |A—AB| = 0 are real. 

This result, which is a generalization of the real case of Theorem 

7.5.1 (p. 209), is almost obvious. For Theorem 13.4.3 establishes 
the existence of a real matrix U such that U^AU == dg(Ai,..., A^), 
where A^,..., A^ are the roots of | A—AB| = 0. Hence all these roots 
are real. 

We may also note an alternative proof depending on the same idea as 
that used in the proof of Theorem 7.6.1. Let co be any root of | A~ AB | == 0 
and let x be any (possibly complex) non-zero vector satisfying 

Ax = coBx. (13.4.4) 

The quadratic form x^Bx and the hermitian form X^Bx belong, by Theorem 

13.1.1 (p. 396), to the same value class. Hence the hermitian form X^Bx is 
positive definite, and so the vector x in (13.4.4) may be chosen so as to 
satisfy the additional relation X^Bx = 1. We then have 

O) = = X^’Ax, 

a = isFAS. = (x^'AX)^ = X^Ax = (o; 
hence to — w, and the theorem is proved once again. 



412 DEFINITE AND INDEFINITE FORMS XIII, § 13.4 

Theorem 13.4.5. If <a is an m-fold root of the equation 
|A—AB| == 0, then the vector space of vectors x such that 
(A—caB)x = 0 has dimensionality m. 

Let Ai,...,A^ be the roots of |A—AB| = 0, and write 
A = dg(Ai,..,,AJ. 

The value o) occurs exactly m times among Ai,...,A^, and there- 
i?(A-wI) = n-m. 

By Theorem 13.4.3 there exists a non-singular matrix U such that 
U^AU = A, U^BU = I. 

Hence 

B(A-wB) = J?{U^(A~a)B)U} = i?( A-o)!) = n-m, 
and the theorem follows. 


Theorem 13.4.6. Letm> 1. If o} is an m-fold root of the equation 
|A-AB| = 0, then there exist m real non-zero vectors which satisfy 
(A —coB)x = 0 and are H-orthogonal in pairs. 

Let Xi be any non-zero vector satisfying 
(A—<oB)Xi = 0. 

Next choose, in succession, non-zero vectors X 2 ,...,x„, such that, 
for * = 2,...,m, < (A—ajB)Xj, = 0, (13.4.6) 

x^'BXj = 0, ..., x2’BXft_i = 0. (13.4.6) 

This choice is possible for every value of k. For suppose that 
Xi,X 2 ,...,Xji,_i have been already chosen. By Theorem 13.4.6 the 
space of vectors x satisfying (A—<oB)x = 0 has dimensionality m. 
Let the vectors 5i.—>?m constitute a basis of this space. Then x*. 
satisfies (13.4.6) if and only if it is of the form 

Writing, for brevity, Bx^ = y^ (i = 1 ,..., 1) we see that (13.4.6) 

means that (Xjj,, y^) = 0 for i = l,...,fc—1, i.e. 

®l(5l> Yi) — {i — 1,...,^ 1). 

This is a system of 1 ^ m—1 < m linear homogeneous equa¬ 
tions in the m unknowns ai,...,ot„,. Hence values of aj,.not 
all zero, can be found satisfying these equations. There exists, 
therefore, a vector x*. 0 satisfying (13.4.6) and (13.4.6). The 

theorem now follows by induction with respect to k. 



413 


XIII, §13.4 REDUCTION OF TWO QUADRATIC FORMS 


Theorem 13.4.7, Let A, B he real ayrrmetric nxnrmtriceSy and 
let B bepoaitive definite. Lety moreover, coi,..., ojjg be the diatinct values 
of the roots of the equation |A—AB| = 0 and denote their multiplicities 
by mi,...,m^ respectively {so that = n). Corresponding 

to each root o),. let m^ real non-zero vectors be chosen which satisfy the 
equation (A—a),.B)x = 0 and are B-orthogonal in paira.’f The 
vectors x 

tu reUUions = 1 (r = l,...,n)4 


.i,...,x^ so obtained may, furthermorCy be assumed to satisfy 


The (real) matrix U having Xi,...,x^ ae its columns is then non¬ 
singular y and the substitution x = Uy transforms x^Ax, x^Bx into 


\yl+--\-Kyl> yl-\--+yl 


respectivelyy where X^y...yX^ are the roots of [A—AB| = 0 (with their 
correct multiplicities).^ 


We note that, for r, e == 

xf Bx, = (13.4.7) 

For, when r ^ s but x^, x^ are associated with the same root of 
1A—AB| = 0, then (13.4.7) is satisfied by construction; while 
when Xy, x^ are associated with different roots, (13.4,7) is satisfied 
by virtue of Theorem 13.4.1. The proof is now completed in 
precisely the same way as the proof of Theorem 13.4.2. 

Exercise 13.4.3. Supply the details of the proof given above in outline. 

The simultaneous transformation described by Theorem 13.4.7 
is effected in a single step and its matrix can therefore be computed 
without too much trouble in numerical cases. Consider, for example, 
the two quadratic forms 

y^j^2z^—2yz-\-2zx—2xyy 2x^-{- 2z^—2yz-\-2zx-]r^xy. 

Their matrices are 


/ ® 

—1 

1) 

/2 

3 

1\ 


1 

-1 

, B= 3 

9 

-1 

\ 1 

—1 

2) 

f \l 

— 1 

2/ 


and it is easily verified (say by Theorem 13.3.1) that B is positive 
definite. 


t When mf> ly this is possible by Theorem 13.4.6; and when m,. = 1 the 
condition of B-orthogonality is, of course, inoperative. 

% This is possible since B is positive definite. 

§ The construction of the matrix U should be compared with the construction 
of a matrix effecting the orthogonal reduction of a single symmetric matrix (p. 303). 



414 


DEFINITE AND INDEFINITE FORMS Xm. $ 13.4 


Now 


A-AB 


/ -2A 

-1-3A 

1-A\ 

(-1-3A 

1—9A 

-1+A 

\ 1-A 

-1-fA 

2—2X1 


and the roots of |A—AB| = 0 are found to be A = —1, 1, 1. Let 
iPy ?j fY S’ vector associated with the root A, so that 


( _2A —1—3A 1—A\M 
—1-3A 1-9A -1 +A)|g| = 0. 

1 —A —1+A 2-2Xl\rJ 

Simplifying this we obtain 

2Aj)+(3A4-l)g4-(A—l)r = O' 
(3A+l)i)+(9A—1)3—(A—l)r = 0 . 

(l-A)(p-3+2r) = 0, 

For A = — 1 this reduces to 


(13.4.8) 


p+q-\-r = 0, p-\-5q—r = 0, p—q-\-2r — 0, 

and so {p>9)^) = (3a, —o, — 2o). 

For A = 1 (13.4.8) reduces to p+2q = 0. We need to find two 
vectors, say {p',q',r'Y, {p"yq",r”Y> satisfying this equation and 
B-orthogonal to each other. If we taike 

(p',q'y) = (2b,-b,0), (13.4.9) 


then ^''+23" = 0, {p" yq" ,r'')B{p' ,q' ,r'Y = 9, 

and in view of (13.4.9) this means 


p"-\-2q'' = 0, p"—33''-)-3r'' = 0. 


Hence (p", 3', r") = (—6c, 3c, 6c). 


The three vectors now chosen must each be made to satisfy the 
condition (p,3,r)B(p.3,r)^= 1. 


i.e. 


2p®+93®-f 2r®—23r-l-2rp-l-%)3 = 1. 


This gives a = 1, 6 = 1/V6, c = 1/V6, and the required reducing 
matrix is therefore given by 


1 ^ 

2/V6 

—6/V6\ 

U= -1 

-1/V6 

3/V6| 

\-2 

0 

V5/ 



416 


Xin, § 13.4 REDUCTION OF TWO QUADRATIC FORMS 


The substitution 


X — 






+V5£. 

therefore reduces the two given quadratic forms to — 
respectively. 

When hermitian forms are considered in place of quadratic forms 
the theory of simultaneous reduction remains virtually unchanged, 
and we leave it to the reader to supply the details. 

Exercise 13.4.4. Let A, B be hermitian matrices and suppose that B is 
positive definite. Show that all roots of the equation |A“AB| = 0 are real. 

Exercise 13.4.5. Let X^Ax, X^Bx be hermitian forms of which the 
second is positive definite. Show that there exists a non-singular matrix U 
such that the substitution X = Uy transforms the given hermitian forms 
into Ai 2 / 1 +...+A„ yi 2/i+...+yn 2/n respectively, where Aj,..., A„ are 

the roots of |A—AB| = 0. 

Deduce that if X^Ax is also positive definite, then all roots of |A—AB| = 0 
are real and positive. 

Exercise 13.4.6. Let X^Ax, X^Bx be hermitian forms of which the 
second is positive definite. Show that there exists a matrix P, satisfying 
|det P| = 1, such that the substitution x = Py transforms both the given 
hermitian forms into diagonal hermitian forms. 

13.4.4. One of the most important applications of the theory of 
simultaneous reduction of quadratic forms arises in the study of 
small vibrations, and was noted by Weierstrass in 1858. The con¬ 
figuration of a dynamical system can be specified by means of a 
set of Lagrangian coordinates and if the g’s are measured 

from zero at a certain configuration of stable equilibrium, then 
(to the first order) the kinetic energy T is a quadratic form in 
?!>•••» ?n» which is necessarily positive definite; and the potential 
energy F is a quadratic form (also positive definite) in ?i, 

Now, by applpng a suitable non-singular linear transformation 


^ ^raQa 

(r = 1... 


which implies, of course, that 



n 

a-1 

(r = 1... 

.,n), 


(13.4.10) 



41« DEFINITE AND INDEFINITE FORMS XIH, § 13.4 


we can reduce both T and V to diagonal forms. Suppose, therefore, 

= 2F = 2:M. 

r=<l r=l 

where all coefficients are positive. Lagrange’s equations 


d (8 


8T , 8V 


“ = 0 (r =!,...,») 

dt \8uJ 8Uf ~ 8Uf 


then take the simple form 


= 0 (r = 1.»), 

each equation involving only one of the coordinates. These equa¬ 
tions for can be solved at once in terms of trigonometric 

functions and the original coordinates qi,...,qn then obtained 
from (13.4.10).t 


13.5. The inequalities of Hadamard, Minkowski, Fischer, 
and Oppenheim 

The theory of positive definite hermitian forms gives rise to a 
number of striking inequaUties, some of which we propose to discuss 
below. In our discussion we follow substantially the treatment 
given by A. Oppenheim. J 

If A = (a„) is a given nxn matrix then will denote, as usual, 
the determinant of the matrix obtained from A by deleting the 
first row and the first column. When A is a positive definite 
hermitian matrix, then > 0 and we define 

“ = i^l/Au. 

If A is positive semi-definite, we put a = 0; so that in either case ^ 

lA| = aAii- (IZro'A) 

The matrix A' is defined as 



Our subsequent discussion rests on the following preliminary 
remark. 

Theorem 13.6.1. If A ia positive definite or positive semi-definite, 
then A' is positive semi-definite. 

t For further discussion of the theory see Whittaker, Analytical Dyncmipa 
(3rd edition), chap. vii. \ 

% J. London MaOt. Soc. 6 (1930), 114-19. ) 




xni,§13.6 INEQUALITIES OF HADAMARD, ETC. 417 

When A is positive semi-definite, then A' = A and there is 
nothing to prove. When A is positive definite we have 

|A'1= |AK«^n = 0. 

Moreover, we know by Theorem 13.3.2 (p. 403), that 

®mm • • • 

. >0 (m = 2,...,71). 

• • • ®nn 

The assertion now follows by Exercise 13.3.3 (p. 407). 


Theorem 13.5.2. 
nxn matrix^ then 


// A = {a^g) is a positive definite hermitian 
|A.| ^ ®11^22 ••• ^nn> (13.5.2) 


with equality in (13.5.2) if and only if A is diagonal. 


The proof is by induction with respect to n. For n == 1 the 
theorem is true trivially. Assume, then, that it holds for n—1, 
where n^ 2, Since A is positive definite we know, by Theorem 
13.5.1, that A' is positive semi-definite. Hence, by Theorem 13.3.5 
(p. 405), all diagonal elements of A' are non-negative, and* so 
(X < dll. Therefore, by (13.5.1) and Theorem 13.3.5, 

|A| < (13.5.3) 

Now ^11 is the determinant of a hermitian positive definite matrix 
of order n—l. Hence, by the induction hypothesis, 

^11 ^ ®22 ••• ®nn> (13.5.4) 

and (13.5.2) follows from (13.5.3) and (13.5.4). 

Next, suppose that 

|A| = dud22 ®nn* (13.5.5) 

Then, by (13.5.3) and (13.5.4), = a22... dnn> by virtue of 

the induction hypothesis this implies that = 0 for r, 5 = 2 ,..., n, 
r ^ s. Hence, by (13.5.5), 


«11 

^12 

dig . 

. 

• ®ln 



^22 

0 . 

• 

. 0 


®31 

0 

^33 • 

• 

. 0 

— d^^dgg ... 

®nl 

0 

0 . 

• 

• ®nn 



E 6 


6582 






418 


DEFINITE AND INDEFINITE F0BM8 Xin, § 13.5 

Subtracting, for r = 2,..., n, Ojy/a,., times the rth row from the first 
row we obtain 



®22 





® 11®22 


a--. 

nn 


This implies that = ••• == ®in = The matrix A is therefore 
diagonal. 

An alternative proof can be made to depend on the inequality of 
the arithmetic and geometric means.f The characteristic roots, 
say Ai,..., A^, of a positive definite hermitian matrix C are positive 
and satisfy, therefore, the relation 


i.e. 



(13.6.6) 


Here the sign of equality holds if and only if all A’s are equal; by 
Exercise 10.1.1 (p. 292) this occurs if and only if C is a scalar matrix. 
If A = (a„) is positive definite, then 



(13.6.7) 


is also positive definite by Exercise 13.1.1 (p. 396); and for this 


matrix 


|C1 = |Al(aii... a„J-S tr G = n. 


Hence (13.6.2) follows by (13.6.6). Moreover, we have equality in 
(13.6.2) if and only if the matrix C, given by (13.6.7), is scalar, i.e. 
if and only if A is diagonal. J 

Theorem 13.6.2 enables us to derive a celebrated result proved by 
Hadamard in 1893. 


Theorem 13.5.3. (Hadamard’s inequality) 

// A = {a^g) 18 any complex non-singular nxn matrix^ then% 

|detA|8 (13.6.8) 

To prove the theorem we consider the Gram matrix 
B = A®'A = (6„) of A. Since S^’Bx = |AxP, it follows that 
X®’Bx > 0 for all x and x^Bx = 0 if and only if Ax = 0, i.e. if 

t This proof is taken from Hardy, Littlewood, and P61ya, Intqualitiea^ 34-35. 

X For yet another proof of Theorem 13.5.2 see Schwerdtfeger, 5, 149-50. 

§ For a geometrical interpretation of this inequality see Hardy, Littlewood, 
and P61ya, op. oit. 34. 



419 


Xni,§13.6 INEQUALITIES OF HADAMABD, ETC. 

and only if x = 0. The hermitian matrix B is therefore positive 
definite, and applying to it the inequality (13.6.2) we obtain 

|5^A| < ... 

where, for r = 

b„ = {X’^h)„ = f\a^\\ 

A:=l 

The assertion therefore follows. 

The inequality (13.5.8) may clearly be restated in the form 
IdetAI < |A„| ... |A*„|, 

and in view of the symmetry between rows and columns this implies 
|detA| ^ lA^^I ... |A^5jj|. 

Corollary. // A = is any complex nxn matrix and 
p == max then 

|det A| < p^n^^. 

This inequality is an immediate consequence of Theorem 13.5.3. 
It will be recalled that it was derived from a different source in 
§ 10.4.2. 

The next result we prove is due to Minkowski. 

Theorem 13.5.4. If A and B are positive definite hermitian 
matrices of order n, then 

|A|i/n+|B|i/^ < |A+B|l/^ (13.5.9) 

with equality in (13.5.9) if and only if B is a scalar multiple of A. 

In the proof of this theorem we shall use a special case of Holder’s 
inequalityt which states that if are positive 

numbers, then 

(«i... ... < {(«i+i9i)... 

(13.6.10) 

with equality if and only if there exists a number k such that 

= KOii (i = !,...,»). (13.6.11) 

We know that the positive definite hermitian forms X^Ax, 
X^Bx can be reduced simultaneously to diagonal form. In fact. 


ftB82 


t See Hardy, Littlewood, and P61ya, op. oit. 21-24. 
Be2 



420 


DEFINITE AND INDEFINITE FORMS XHI, § 13.5 

by Exercise 13.4.6 (p. 415), there exists a matrix P such that 
IdetPj = 1 and 

P^AP = dg(ai...., «„), P^BP = 
where all as and jS’s are, of course, positive. This implies that 
P2’(A+B)P = dg(«i+j8i,...,«„+j8„). 

Now 

|A| = ... a„, |B| = ... |A+B| = (aj+^i)... (o(„+^„), 

and the inequality (13.6.9) now follows by (13.6.10). Moreover, 
there is equality in (13.5.9) if and only if (13.6.11) is satisfied, 
i.e. if and only if dg(j8i,...,j8„) = »cdg(ai,...,«„). This means that 
P^’BP = /cPi’AP, i.e. B = /cA. 

CoEOLLABY. If A and B are positive definite hermitian matrices 
of order greater than 1, then 

1A| + |B|<|A+B|. 

Exercise 13.5.1. Show that the inequality (13.5.9) still holds when the 
matrices A and B are either positive definite or positive semi-definite. 

We next deduce an inequality discovered by E. Fischer in 1908. 

Theorem 13.5.5. // A == (a^^) is a positive definite hermitian 
nxn matrix, then, for r = 1,,.., n—1, 



• «lr 


®r+l,r+l 

• 

• 

• ®r+l,n 

®rl 

. a,^ 


®w,r+l 

. 

• 

• nn 


• • • ®ln 

^.(13.5.12) 

• • • ^nn 

Denote by B the matrix obtained from A when the first r rows 
and then the first r columns of the latter are multiplied by — 1. Then 
B is again positive definite by Exercise 13.1.1 (p. 396), and 
1B| = |A|. Moreover ^ 

where A denotes the value of the left-hand side in (13.5.12). 
Substituting in (13.5.9), we obtain |A| < A, which is the required 
inequality. 

Exercise 13.6.2. Deduce the inequality of Theorem 13.6.2 from Theorem 
13.6.6. 




421 


XIII, §13.6 INEQUALITIES OF HADAMARD, ETC. 

If A = (a^g), B = {b^g) are two given matrices, we shall denote by 
AxB the matrix whose (r,5)th element is 

Theorem 13.5.6. If A, B are positive definite or positive semi- 
definite hermitian nxn matr%i.es, then the {hermitian) matrix AxB 
is also positive definite or positive semi-definite, and 

|AxB| ^6,i...6^^|A|. (13.5.13) 

The fact that A x B is positive definite or positive semi-definite 
was noted by Schur, while the inequality (13.5.13) is due to 
Oppenheim. 

We know that there exists a non-singular matrix P = {p^f^ such 
that B = PAP^, where A = dg(j8i,...,j8^) and are non¬ 

negative numbers. Writing B = (6^^), (^(a:i,...,a;,^) == x^Ax, we 
have ^ 

Ks = 'IhfrkPak (>•.« = 
k —1 

and therefore 

x2’(AxB)X= f f f^raPkPrkPsk^r^s 

= fpk<f>iPlk^l>->PnkXn)- 

k=l 

Hence x^(A x B)x ^ 0 for all x, and the first part of the theorem 
is proved. We have shown, in particular, that 

|AxB| > 0. (13.5.14) 

To prove (13.5.13), we write C = AxB and use induction with 
respect to n. For n = I the assertion is true trivially. Assuming 
that it holds for n--l, where n ^ 2, we have 

Cn>h2-KnAiv (13.5.15) 

Now the matrix A' (defined on p. 416) is positive semi-definite by 
Theorem 13.5.1. Hence, by (13.5.14), |A'xB| ^ 0, i.e. 

[AxBl—^ 0, 

and so, by (13.5.15), 

[A X B| ^ 06611622 ••• — ^11 ••• 

Prom Theorems 13.5.6 and 13.5.2 we deduce at once a further 
consequence. 

Corollary. If A and B are positive definite or positive semi- 
definite hermitian matrices, then 

lAxBI > |AI1B|. 



422 


DEFINITE AND INDEFINITE FORMS XIII, § 13.6 

Exebcise 13.5.3. Interpret this corollary for the case B = I. 

Exeboise 13.5.4. Show that, under the same conditions as in Theorem 

|AxBl 

Exeboise 13.5.5. Show that if A and B are positive definite, then so is 

AxB. 

Schur sharpened Theorem 13.5.6 by showing that, under the 
same conditions, 

lAxBl+|A| |B| > ... 6^^|A|+aii... a^^\B\, 

In view of Theorem 13.5.2 this inequality clearly implies (13.5.13). 


PROBLEMS ON CHAPTER XIII 


1. Show that, if A is a positive definite hermitian matrix, then so is A*. 

2. Let A and B be positive definite or positive semi-definite hermitian 
matrices. Show that |A-fB| > 0. 

3. Determine from first principles the value class of the binary quadratic 
form ax^ + 2bxy + 

4. Let (ttij), (tij) be two real symmetric nxn matrices and, for 1 < r < n, 
write 

A,(A,/i) = 




XOri+flbn . 

. . Xttff-j-fxbff 


Showthat,if A,.(1,0) > 0andA,(0,1) > 0for all r = l,...,n, then Ay(A,jL6) > 0 
for all r = l,...,n and all A > 0, /x > 0. 

5. Let G be the Gram matrix of the m x n matrix A. Show that G is a 
hermitian positive definite or a hermitian positive semi-definite matrix 
according as i2(A) = n or R(A) < n. 

6. Show that a real symmetric matrix A is positive definite if and only if 
it can be written in the form A = P^P, where P is some non-singular matrix. 

7. Show that, if the real symmetric matrix A is either semi-definite or 
indefinite, then the region specified by the inequality Ax < 1 is unbounded. 

8. A and B are positive definite hermitian matrices. Show that all roots 
of the equation |A—ABj = 0 are positive. Show further that all roots are 
equal to 1 if and only if A = B. 

9. Let A be a hermitian matrix and write 


|AI~A| = A»»-diA«-i+daA"“*-f-...-f-(-l)”d„. 

Show that A is positive definite if and only if > 0,...,d„ > 0. 

10. If d is the value of the determinant 


cosasina cosa since 1 
cosjSsinjS cobP sinjS 1 
(^osysiny cosy siny 1 ' 
cpsSsinS cosS sinS 1 

show, by Hadamard’s inequality, that |d| < 81/16. 




XIII 


PROBLEMS ON CHAPTER XIII 


423 


11. Show that the quadratic form 

is positive semi-definite, and that it vanishes if and only Hx^ = ... = x^, 

12. Determine the range of values of A for which the quaternary quadratic 

X{x'^+y'‘-\-z^)+2xy-2yz-\-2zx+fi 
is positive definite. Also discuss the case A == 2. 

13. Determine the rank, signature, and value class of the ternary quadratic 

ax^-\-hy^-\-cz^-\-ayz-{-hzx-\‘Cxy^ 
where a, 6 , c are real numbers, not all zero. 

14. Determine, for every (real) value of A, the value class of the ternary 
quadratic form 

(6A+3)a;^+( 1 2A+ 5)2/^ + 52* +(8A+ 2)yz -j- (4A+6)za;— 2xy, 

15. Show that Theorem 13.4.5 (p. 412) becomes false if either the condition 
of reality or of positive definiteness is omitted. 

16. Show that Hadamard’s inequality (Theorem 13.5.3, p. 418) reduces 
to an equality if and only if the columns of A form an orthogonal set of 
vectors. Also find necessary and sufficient conditions for the equality 

|dctA| = |Ai.l... lAn.l 

to be valid. 

17. (i) Show that the inequality given by the corollary to Theorem 13.5.3 
(p. 419) is best possible for every value of n. 

(ii) Show that this inequality becomes an equality if and only if the 
columns of A form an orthogonal set and the elements of A are all equal, in 
modulus, to p, 

18. Leta,.i,...,a,.„(r — 1,..., m) bom given sets ofn complex numbers each. 
By considering the norm of the vector 

m 

r-1 

where are arbitrary complex numbers, obtain Qram's inequality: 











Show also that equality occurs if and only if the vectors (ayi,...,ay^), 
r ~ l,...,m are linearly dependent. 

Prove the same result by considering the Gram matrix of 

/Oil . . . Omi\ 


Discuss, in particular, the case m = 2 of Gram’s inequality. 

19. Find a real linear non-singular transformation which reduces one of 
the two quadratic forms 

3a;*+5y *+52* -f- 2yz -f 62 a; — 2a;y, 5a;* +1* -f Syz +42a; 

to f*4-T7*+£* aiid the other to A|*-f pi 7 *-f v^*, where A, p, v are suitable 
constants. 




424 DEFINITE AND INDEFINITE FOBMS XIII 

20 . Find a real linear non-singular transformation which reduces one of 

the two quadratic forms 

6x^ +6i/® -f +3t/*+3^*— 2i/z +22a;+ 2xy 

to and the other to 

21. Prove Theorem 13.6.2 (p. 417) by writing the positive definite 

hermitian matrix A in the form 



(where Aj is a square matrix of order n— 1) and using the result of Problem 
I, 16. 

22 . Show that the quadratic form 0(a;i,...,a;„) is positive definite if and 
only if it can be written as 

where are positive numbers and, for 1 < A; < n, is a linear form 

in Xj^,X]^^i,,».,Xn, with 1 as the coefficient of Xj^. 

If Q(x 2 ,»,.,Xn) is expressible in the above form and if 

< a:J+...+»:» 

for all values of Xi,.„,Xn, show that C;^. < 1 (1 < ^ < n), and deduce that the 
determinant of Q(Xi,.„,x^) does not exceed 1. 

23. Using Theorem 13.2.1 (p. 398), show that any positive definite 
hermitian matrix A = (a^g) can be represented in the form A = Q^O> 
where Q is triangular. Deduce that |A| < ^11022 

24. A = (a^j) and B = (bij) are symmetric nxn matrices, and the 
equation |AA—B| = 0 has n distinct roots Ai,...,A,i. Prove that there exists 
an nxn matrixT = (t^j) such that, for all i,j = l,...,n, 

n n 

= ,2 

fc-l fc = l 


and such that T has no column of zeros. 

Prove that the matrices T^AT, T^BT are both diagonal. 

26. Let the non-singular quadratic forms x^Ax and x^Bx be simul¬ 
taneously reducible to diagonal forms ^i^i-h — -hb„x^ 

respectively. Show that the n roots of the equation |A—AB| = 0 are 

Show that the quadratic forms x^-i-2xy and 2x^ are not simultaneously 
reducible to diagonal form. 

26. Let A be a real positive definite matrix. Show that the region B of 
(real) n-dimensional space specified by the inequality x^Ax < 1, where 
X = (iVi,,..,Xn)^, is bounded, i.e. there exists a positive constant K such 
that, for every point (sci,..,,x„) € B, we have \xi\ < X,..., |a;,j| < K, 

If the volume F of i? is defined as the n-fold integral 

F = J dxi.„dxn9 

x*'Ax<l 


show that 



Hence obtain the area enclosed by the ellipse ax^+2hxy4-by^ = 1. 



XIII 


PROBLEMS ON CHAPTER XIII 426 

27. Let be a real, positive definite matrix, and let a real matrix A be 
called (^‘Orthogonal if the substitution x = Ay leaves the quadratic form 

invariant. Show that the most general form of an $2-orthogonal matrix 
is A — M“^PM, where M is a certain fixed matrix and P an arbitrary 
orthogonal matrix. 

Find the most general S2-orthog»>/nal matrix when = dg(ai,...,a,i) and 
> 0,...,a„ > 0. 

28. Let -orthogonality be defined as in the preceding question. Show 
that the characteristic roots of any ft-orthogonal matrix are all of unit 
modulus. Does this statement remain true if the positive definiteness of 
Si is not insisted on ? 

29. Show that, if H, K are positive definite hermitian matrices and 
H2 = K*, then H = K. 

30. Show that, if A is a non-singular matrix, then there exist positive 
definite hermitian matrices Hj, H2 and a unitary matrix U such that 

A = UHi = HgU. 

Verify that this representation of A is unique. (It is known as the polar 
representation of A.) Show further that if and only if A is normal. 

31. Show that a matrix A is similar to a diagonal matrix if and only if 
there exists a positive definite hermitian matrix H such that H~^AH is 
normal. 

32. A and B are positive definite hermitian matrices, and a is a number 
such that 0 < a < 1. Show that 

|aA+(l~a)B| > lAhlBji- 

and that there is equality if and only if A = B. Extend this result to any 
number of matrices. 

33. Show that, if Ai,...,A;5. are positive definite hermitian matrices of 

order n, then > |Ai|»/>>+...+ |Ai|»/». 

with equality if and only if are all scalar multiples of a single 

matrix. 

34. From the result of Problem VIII, 34 deduce that every positive 
definite hermitian matrix H can be expressed in the form H = where 
A is triangular. 

00 

36. By considering the integral J dt for a 

0 

suitable quadratic form prove that the determinant is 

positive, where 

Show further that, if the matrix (a^s) is positive definite, then (ara/(^+«)) 
is also positive definite. 

n 

36. Let F(a:i,...,a;„) = 2 Oij a?/ 

the determinant formed by the first r rows and r columns of the matrix 
(a<^). Show that, if 0, there exists a transformation x^ = 

(i = l,...,n) such that 

W~1 

F(x^ .*„) = 2 

i,i»l -^n-1 

Deduce that, if .4,. > 0 (r = then F(x^,,„,x^) is positive definite. 



426 DEFINITE AND INDEFINITE FORMS XIII 

37. If x^Ax is a positive definite quadratic form and x^Ax < x^x for all 
real vectors X, show that |A I < 1. Show further that, if x^Bx and x^Cx are 
positive definite quadratic forms such that x^Bx < x^Gx for all real x, then 
1BU|G|.^ 

If x^Ax is a positive definite quadratic form and 



find a real matrix T of determinant 1 such that 


T^AT 



where Ag = Ag—V^Ar^V. Deduce Fischer’s inequality (Theorem 13.5.5). 

38. Let denote the sum of the A;th powers of the roots of an algebraic 
equation of degree n with real coefficients. Show that the n roots are real and 
distinct if and only if the real symmetric matrix 


/^2n-2 

^2n-3 

• 

• «n-l\ 

I ^2n-3 

^2n-4 

• 


\«n-l 

^n-2 

, , 

. J 


is positive definite. 




BIBLIOGRAPHY 


All books listed below deal whody or in part with linear algebra. The 
books of the first series provide a treatment of this subject which is roughly 
comparable in extent to that offered here. The second list enumerates books 
which have less affinity with the present treatment but may be usefully 
consulted on specific points. The books in both scries are arranged approxi¬ 
mately in order of increasing difficulty. 

yl. T. L. Wade: The Algebra of Vectors and Matrices (Cambridge, Mass., 
1951). 

■ 2, W. L. Ferrari Algebra—A Textbook of Determinants , Matrices , and 
Algebraic Forms (Oxford, 1941). 

’ 3. G. Birkhoff and S. MacLane : A Survey of Modern Algebra (New York, 
1946). 

4. O. ScHREiER and V, Sperner: Introduction to Modern Algebra and 

Matrix Theory (New York, 1951). 

5. H. ScHWERDTFEOER: Introduction to Linear Algebra and the Theory of 

Matrices (Groningen, 1950). 

6. S. Perlis: Theory of Matrices (Cambridge, Mass., 1952). 

7. R. R. Stoll : Linear A Igebra and Matrix Theory (Now York-Toronto- 

London, 1952). 

* 8. C. C. MacDuffee: Vectors and Matrices (Cams Monographs, 1943). 


9. C. V. Durell and A. Robson: Advanced Algebra, Volume III (London, 
1937). 

10. A. C. Aitken : Determinants and Matrices (4th edition; Edinburgh and 

London, 1946). 

11. M. Bocher: Introduction to Higher Algebra (New York, 1936). 

12. O. Perron; Algebra (2nd edition; Berlin and Leipzig, 1932-3), 

2 volumes. 

13. H. Hasse : Hohere Algebra, Band I, Lineare Gleichungen (3rd edition; 

.Berlin, 1951). 

14. P. R. Halmos; Finite Dimensional Vector Spaces (Princeton, 1942). 

15. W. L. Ferrari Finite Matrices (Oxford, 1951). 

16. R. A. Frazer, W. J. Duncan, and A. R. Collar; Elementary Matrices 

(Cambridge, 1938). 

17. A. Liohnerowicz I Algebre et analyse liniaires (Paris, 1947). 

18. H. W. Turnbull; The Theory of Determinants, Matrices, and In¬ 

variants (London and Glasgow, 1929). 

19. H. W. Turnbull and A. C. Aitken: An Introduction to the Theory of 

Canonical Matrices (London and Glasgow, 1948). 

20. N. Jacobson : Lectures in Abstract Algebra, Volume II, Linear Algebra 

(New York, 1953). 

21. H. L. Hamburger and M. E. Grimshaw; Linear Transformations in 

n-Dimensional Vector Space (Cambridge, 1951). 

22. C. C. MacDuffee; The Theory of Matrices (Berlin, 1933). 



428 BIBLIOGRAPHY 

23. C. C. MaoDupfbb: An Introdiiction to Abstract Algebra (New York, 

1940). 

24. A. A. Albert: Modem Higher Algebra (Cambridge, 1938). 

25. G. Julia; Introduction rimtMinatique aux theories quantiquee. Premiere 

partie (2nd edition; Paris, 1949). 

26. B. L. VAN DBR Wabrdbn: Oruppcn von linearen Transformationen 

(Berlin, 1935). 

27. A. Spbisbb ; Die Theorie der Oruppen von endlicher Ordnung (3rd edition; 

Berlin, 1937). 

28. J. H. M. Wedderburn: Lectures on Matrices (New York, 1934). 

29. F. D. Murnaqhan: The Theory of Oroup Representations (Baltimore, 

1938). 

30. J. DiEUDONNii; Sur les groupes classiques (Paris, 1948). 

31. H. Weyl: The Classical Croups—Their Invariants and Representations 

(2nd edition; Princeton, 1946). 



INDEX 


Abelian groups, 264. 

Adjugate: determinant, 24-27; matrix, 
88-90. 

Affine classification of quadrics, 381-4. 

Algebraic complement, see Cofactor. 

Alternating group, 287. 

Angle of an orthogonal matrix, 236. 

Arrangements, 2-6. 

Augmented matrix, 140. 

Automorphisms of a linear manifold, 
126-6, 268-9. 

Axis of rotation, 240. 

Ba.sis: basis theorems, 61-53; change 
of basis in a linear manifold, 111-13; 
completion, 64; orthogonal and ortho- 
normal, 66-68, 156-6. 

Bendixon’s inequality, 210. 

Bessel’s inequality, 70. 

Bilinear; forms, 354-6, 368-9; opera¬ 
tors, 363-6; s 3 rmmotric bilinear 
operators, 361. 

Biunique correspondence, 4. 

Bromwich’s inequality, 388-9. 

Canonical form(s): classical, 312; of 
matrices, 290-316; of quadratic 
forms, 378. 

Cayley and Hamilton, theorem of, 206. 

Characteristic; equation, 196; inequali¬ 
ties for characteristic roots, 210-12, 
309-11, 388-90; polynomial, 195, 
197-208; regular characteristic roots, 
294; roots, 195; roots of rational 
functions of matrices, 201-2; vectors, 
214. 

Circulant, 36. 

Classical canonical form, 312. 

Classification of quadrics, 380-5. 

Cofactor: of an element, 14; of a 
minor, 21. 

Collineations, 123, 196, 266, 380-6. 

Column: matrix, 77; rank, 137; vector, 
77. 

Commuting matrices, 81. 

Complements, 64-67; orthogonal, 68- 
69. 

Completion of basis, 64. 

Congruence transformations, 182-6, 
266, 358-60. 

Congruent matrices, 182. 


Consistency of systems of linear equa¬ 
tions, 132, 140-1. 

Convergence: of matrix sequences, 327- 
9; of matrix series, 330-2. 

Cramer’s rule, 134. 

Critical minor, 136. 

Definite hermitianand quadratic forms, 
394, 398-404. 

Determinant(8): adjugate, 24-27; axio¬ 
matic characterization, 189-92; defi¬ 
nition, 6; dominated by diagonal 
elements, 32-33; expansions by rows 
and columns, 15; Gram determinant, 
155; inequalities, 33, 212-13, 416-22; 
of a matrix, 87; multiplication, 12- 
13; Vandermonde determinant, 17- 
18. 

Determinant rank, 136. 

Determinantal criteria for value classes, 
400-7. 

Diagonal: canonical forms, 290-306; 
matrix, 76. 

Differential: equations, 343-8; opera¬ 
tors, 126-7, 133-4. 

Differentiation: of determinants, 36; 
of matrices, 339. 

Dimensionality: of a linear manifold, 
60; theorem for homogeneous 
systems, 149. 

Discriminant, 18-19. 

Disposable unknowns, 142. 

Divisors of zero, 95-96, 153-4. 

Elementary: divisors, 194; matrices 
(^-matrices), 170; operations (JS7- 
oporations), 168-78. 

Equivalence: of bilinear forms, 368; 
class, 187; of matrices, 176; w.r.t. 
operators, 189, 267; of quadratic 
forms, 376-9; relation, 186; of 
systems of linear equations, 142; 
transformations, 177. 

Euler: equations of transformation, 
242; theorem on rigid motion, 246. 

Exponential function of a matrix, 336- 
41. 

Field, 39; real and complex, 40. 

Finite groups, 264, 270-1. 

Fischer’s inequality, 420. 



430 


INDEX 


Forms: bilinear, 354-6, 368-9; hermi- 
tian, 386-8, 415; polarized, 362; 
quadratic, 357-60. 

Frobenius: determinantal criterion for 
positive definite forms, 400; theorem 
on non-negative matrices, 329; 
theorem on rank, 162. 

Full linear group, 263, 268-9. 

Functions of matrices, 336, 342-3. 

Fundamental sets of solutions, 151. 

Generators: of a linear manifold, 45; 
of a vector space, 43. 

Gram: determinant and matrix, 156; 
inequality, 423. 

Group(s); abelian, 264; alternating, 
287; axioms, 252; centre, 254; con¬ 
gruence group, 266; finite, 254, 270- 
1; full linear, 263, 268-9; group 
matrix, 273; isomorphism, 259; of 
matrices, 261-6; multiplication table, 
260-1; order, 264; orthogonal, 263; 
orthogonal similarity group, 266; of 
permutations, 267-8, 271-2; pro¬ 
jective, 263; representation by ma¬ 
trices, 267-72; rotation group, 263; 
similarity group, 266; of singular 
matrices, 272-6; symmetric, 267. 

Hadamard’s inequality, 418-19. 

Hamilton, see Cayley. 

Hermitian: forms, 386-8,415; matrices, 
209, 301, 304; positive definite forms, 
394, 398-404. 

Hirsch’s inequality, 211. 

Homogeneous linear equations, 27-30, 
131,148-52; dimensionality theorem, 
149; fundamental sots of solutions, 
161. 

< 

Idempotent matrix, 107. 

Image space, 277. 

Indefinite hermitian and quadratic 
forms, 394, 407. 

Index laws for matrices, 83, 93-94. 

Inequality(ies): of Bendixon, 210; of 
Bessel, 70; of Bromwich, 388-9; for 
characteristic roots, 210-12, 309-11, 
388-90; for determinants, 33, 212-13; 
of Fischer, 420; of Gram, 423; of 
Hadamard, 418-19; of Hirsch, 211; 
of Minkowski, 419-20; of Oppenheim, 
421; of Ostrowski, 212-13; for posi¬ 
tive definite matrices, 416-22; of 
Schur, 309-11, 422; triangle, 64. 


Inertia, law of: for hermitian forms, 
387; for quadratic forms, 377. 

Infinitesimal rotation, 249. 

Inner product, 62. 

Interpolation formula of Sylvester, 221. 

Invariance, 169. 

Invariant: properties of matrices, 122; 
spaces, 277-86. 

Inverse matrix, 91, 178-9. 

Isomorphism: between groups, 269; 
between linear manifolds, 68-69, 
123-4. 

Jacobi’s theorem, 26. 

Joachimsthal’s equation, 360-2. 

Klein four-group, 261. 

Kronecker delta, 20. 

Lagrange’s method of reduction, 371-4. 

Laplace’s expansion theorem, 21. 

Latent roots, see Characteristic roots. 

Latent vectors, see Characteristic vec¬ 
tors. 

Leading element, 173 (footnote). 

Length of a vector, 63. 

Linear: combination of vectors, 42; 
dependence, 48, 162-3; equations, 
27-30, 131-62, 180-1; forms, 163; 
full linear group, 263, 268-9; homo¬ 
geneous equations, 27-30, 131, 148- 
62; operators (operations, trans¬ 
formations), 114-22, 277; substitu¬ 
tions, 74, 85-87; transformations of a 
bilinear form, 366; trcuisformations 
of a quadratic form, 360. 

Linear manifold(s): automorphism, 
126-6, 268-9; basis, 49-64; change 
of basis, 111-13; definition, 44; 
dimensionality, 60; generators, 45; 
isomorphism, 68-69, 123-4; of matri¬ 
ces, 80; representation by a vector 
space, 60; spanned by given elements, 
46; zero element, 46. 

Majorization of matrices, 328. 

Mapping, 113; into, 116; onto, 124. 

Matrix (matrices): addition, 78-79; 
ad jugate, 88-90; augmented, 140; 
of coefficients, 140; column type and 
row type, 77; commuting, 81; con¬ 
gruent, 182; definition, 74; deter¬ 
minant of, 87; diagonal, 76; elemen¬ 
tary, 170; elementary divisors, 194; 
equations, 298-300; equivalence, 
176; functioxxs, 336, 342-3; Gram, 




INDEX 


Matrix (conU) 

166; group matrix, 273; groups, 
261-6; hermitian, 209, 301, 304; 
idempotent, 107; index laws, 83, 
93-94; invariant properties, 122; 
inverse, 91, 178-9; linear manifolct 
of, 80; multiplication, 80-86; multi¬ 
plication by scalars, 78; normal, 
306-6, 309-10; orthogonal, 222-9; 
partitioned, 100-6; periodic, 298; 
permutation matrices, 271; poly¬ 
nomials, 97-99, 201-8, 342-3; power 
series, 332-43; rank, 136-40, 158-63; 
rational functions, 99-100, 201-2, 
204-6, 336; representations of groups 
by matrices, 267-72; scalar, 76; 
sequences, 327-9; series, 330-2; 
similar, 119; singular and non¬ 
singular, 87; skew-hermitian, 209, 
232; skew-symmetric, 208, 227, 243- 
6; square, 75; symmetric, 183, 209, 
301-4; transposed, 76,97; triangular, 
76; unit, 76; unitary, 229-32; zero, 
76. 

Maximal invariant space: of a linear 
transformation, 277-82; of a group 
of transformations, 282-6. 

Metric classification of quadrics, 385. 

Minimum polynomial, 203-4, 207-8, 
297-8. 

Minkowski: inequality for positive 
definite matrices, 419-20; theorem 
on dominated determinants, 31-32. 

Minor: critical, 136; cofactor of, 21; 
of a determinant, 20; of a matrix, 
136; principal, 197. 

Monic polynomial, 203. 

Motion; proper and improper, 246; 
rigid, 246-7. 

Multiplication table for groups, 260-1. 

Non-commutativity of matrix multi¬ 
plication, 81. 

Non-negative matrix, 328-30. 

Non-singular matrix, 87. 

Norm of a vector, 63. 

Normal form, reduction of a matrix to, 
173-6. 

Normal matrix, 306-6, 309-10. 

Normalization of a vector, 64. 

Nullity, Sylvester’s law of, 162. 

Operations, elementary, 168-78. 

Operators: bilinear, 363-6; differential, 
126-7, 13.3-4; linear, 114-22, 277; 


431 

multiplication, 172; polarized, 361; 
quadratic, 367-9, 361-2. 

Oppenheim’s inequality, 421. 

Orthogonal: basis, 65-68; complement, 
68-69; congruence group and trans¬ 
formation, 266; group, 263; reduc¬ 
tion of quadratic forms, 362-3; 
reduction of symmetric matrices, 
302-4; set, 65-67; similarity, 266; 
vectors, 64. 

Orthogonal matrices, 222-9; angle, 236; 
principal vector, 237; proper and 
improper, 233. 

Orthogonalization process of Schmidt, 
67-68. 

Orthonormal bases and sets, 66-68, 
165-6. 

Ostrowski’s inequality, 212-13. 


Partitioned matrices, 100-6. 

Periodic matrices, 298. 

Permutation matrices, 271. 

Permutations, 256-9; groups of, 257-8, 
271-2. 

Polar representation, 426. 

Polarized; form, 362; operator, 361. 

Polynomial: characteristic, 195, 197- 
208; identity and formal equality, 
24; in a matrix, 97-99, 201-8, 342-3; 
minimum, 203-4, 207-8, 297-8; 

monic, 203. 

Positive definite forms, 394, 398-404. 

Power series of matrices, 332-43. 

Premultiplication and postmultiplica¬ 
tion, 81. 

Principal; minor, 197; vector, 237. 

Principal axes, reduction of quadrics 
to, 364-7. 

Projection, 130. 

Projective: classification of quadrics, 
380-1; group, 263. 


Quadratic form(s): canonical forms, 
378; definite and indefinite, 394; 
definition, 357; determinant, 369; 
diagonal forms, ,362; equivalence, 
376-9; general reduction, 369-74; 
law of inertia, 377; orthogonal re¬ 
duction, 362-3; positive definite 
forms, 394, 398-404; as product of 
two linear forms, 374-6; rank, 360; 
simultaneous reduction, 408-16; sin¬ 
gular and non-singular, 369; unit 
forms, 362. 




432 


INDEX 


Quadratic operators, 357-9, 361-2. 

Quadrics: classification, 380--6; re¬ 
duction to principal axes, 364-7. 

Quaternions, 265. 

Bank: of a bilinear form, 356; of a 
linear transformation, 277; of a 
matrix, 136-40; properties, 158-63, 
169; of a quadratic form, 360; rank 
theorem, 139; rank-multiplicity theo¬ 
rem, 214. 

Bational: functions of matrices, 99- 
100, 201-2, 204-5, 335; reduction, 
185. 

Reduction: of bilinear forms, 368-9; 
of hermitian forms, 387; Lagrange’s 
method, 371-4; of matrices to 
normal form, 173-6; orthogonal 
reduction of quadratic forms, 362-3; 
orthogonal reduction of symmetric 
matrices, 302-4; of quadratic forms, 
369-74; of quadrics to principal axes, 
364-7; rational, 185; simultaneous 
reduction of two hermitian forms, 
415; simultaneous reduction of two 
quadratic forms, 408-15. 

Redundant equations, 131. 

Reference field, 40. 

Regular characteristic roots, 294. 

Replacement theorem of Steinitz, 53 
(footnote). 

Representation: of bilinear operators 
by bilinear forms, 354; of groups by 
matrices, 267-72; of linear manifolds 
by vector spaces, 60; of linear 
operators by matrices, 114-22; polar, 
425; of quadratic forms as sums of 
squares, 370-1; of quadratic opera¬ 
tors by quadratic forms, 358; of 
rotations by orthogonal matrices, 
233-43, 246-7; of rotations by skew- 
symmetric matrices, 243-5. 

Rigid motion, 246-7. 

Rotation: axis, 240; infinitesimal, 
249; in the plane, 233-6; representa¬ 
tion by orthogonal matrices, 233-43, 
246-7; representation by skew- 
symmetric matrices, 243-5; rotation 
group, 263; in space, 236-47. 

Row: matrices and vectors, 77; rank, 
137. 

Scalar: matrix, 76; product, see Inner 
product. 

Scalars, 40. 


Schmidt’s orthogonalization process, 
67-68. 

Schur: inequality for characteristic 
roots, 309-11; inequality for positive 
definite matrices, 422; theorem on 
triangular canonical forms, 307. 

Schur and Toeplitz, theorem on normal 
matrices, 305. 

Secxilar equation, 196. 

Semi-definite forms, 394, 405-7. 

Separation, 63. 

Signature, 378-80, 387-8, 398. 

Signum, 3. 

Similar matrices, 119. 

Similarity: group, 266; orthogonal and 
unitary, 266; transformation, 119. 

Simultaneous: equations, see Linear 
equations; reduction of two hermitian 
forms, 415; reduction of two quad¬ 
ratic forms, 408-15; similarity trans¬ 
formations, 316-22. 

Singular and non-singular matrices, 87. 

Singular matrices, groups of, 272-6. 

Skew-hermitian matrices, 209, 232. 

Skew-S 3 rmmetric matrices, 208, 227, 
243-5. 

Small vibrations, 415-16. 

Square matrices, 75. 

Steinitz’s replacement theorem, 63 
(footnote). 

Sub-: group, 254; manifold, 45; matrix, 
136; space, 45. 

Sylvester, interpolation formula, 221; 
law of inertia, 377; law of nullity, 162. 

Symmetric: bilinear operators, 361; 
group, 257; matrix, 183. 

Toeplitz, see Schur. 

Total vector space, 42. 

Trace, 198. 

Transformations: congruence, 182-5, 
266, 368-60; definition, 113; equiva¬ 
lence, 177; linear, 114-22, 277, 356, 
360; similarity, 119. 

Transposition: of arrangements, 3, 
258; of matrices, 76, 97. 

Triangle inequality, 64. 

Triangular: canonical forms, 306-9; 
matrices, 76. 

Unit: matrix, 75; quadratic form, 362; 
vector, 64. 

Unitary: group, 263; matrices, 229-32; 
reduction of hermitian forms, 387; 
similarity, 266. 



INDEX 


Value classes: definition, 394; deter- 
minantal criteria, 400-7. 

Vandermonde determinant, 17-18. 

Vector(s): addition, 41; characteristic, 
214; column type and row type, 77; 
definition, 40; length, 63, linou.r 
combination, 42; linear dependence, 
48, 162-3; multiplication by scalars, 
40; norm, 63; normalization, 64; 
orthogonal and orthonormal, 65; 
principal, 237; zero, 40. 

Vector space: basis theorem, 62; dofini- 


433 

tion, 42; generators, 43; spanned by 
given vectors, 43. 

Weierstrass's theorem on simultaneous 
reduction of two forms, 410. 

Weyr’s theorem on matrix power 
series, 332. 

Zero: divisors of, 95-96,163-4; element 
of a linear manifold, 46; matrix, 75; 
vector, 40. 



PRINTED IN 
GREAT BRITAIN 
AT THB 

UNIVERSITY PRESS 
OXFORD 
BY 

CHARLES BATEY 
PRINTER 
TO THB 
UNIVERSITY 




